Responsible AI in Consumer Enterprise · artificial intelligence. Responsible AI in Consumer Enterprise | 2 Acknowledgements This framework has benefited from input and feedback from - [PDF Document] (2024)

  • Responsible AI in Consumer EnterpriseA framework to helporganizations operationalize ethics, privacy, and security as theyapply machine learning and artificial intelligence

  • Responsible AI in Consumer Enterprise | 2

    Acknowledgements

    This framework has benefited from input and feedback from adiverse group of enterprise executives, product designers, machinelearning scientists, security engineers, lawyers, academics, andcommunity advocates. The first draft was released for peer reviewat the 2018 RightsCon conference in Toronto.

    We’d like to extend a special thanks to the followingindividuals and institutions for supporting our efforts andproviding helpful comments to make the framework as useful andaccurate as possible:

    ALTIMETER INSTITUTE

    Susan Etlinger

    CORNELL TECH

    Helen Nissenbaum

    CORUS MEDIA

    Anne Harrop

    Jane Harrison

    GEORGIAN PARTNERS

    Jason Brenier

    Nick Chen

    Parinaz Sobhani

    Jon Prial

    GOOGLE

    Kevin Swersky

    INFORMATION AND PRIVACY COMMISSIONER OF ONTARIO

    David Weinkauf

    LOBLAW DIGITAL

    Richard Downe

    MCCARTHY TÉTRAULT LLP

    Adam Goldenberg

    Carole Piovesan

    MICROSOFT

    Andree Gagnon

    OSLER, HOSKIN & HARCOURT LLP

    Adam Kardash

    Patricia Kosseim

    PRIVACY BY DESIGN CENTRE OF EXCELLENCE, RYERSON UNIVERSITY

    Ann Cavoukian

    RIGHTSCON

    Brett Solomon

    Melissa Kim

    SCOTIABANK

    Mike Henry

    Daniel Moore

    Michael Zerbs

    Dubie Cunninghman

    Veeru Ramaswamy

    TELUS

    Pam Snively

    Elena Novas

    THE ENGINE ROOM

    Alix Dunn

    VECTOR INSTITUTE AT THE UNIVERSITY OF TORONTO

    Richard Zemel

    Cameron Schuler

    Frank Rudzicz

    Marc-Etienne Brunet

    Elliot Creager

    Jesse Bettencourt

    Will Grathwohl

  • Responsible AI in Consumer Enterprise | 3

    Data has become a business-critical asset, and organizationsacross

    all sectors are recharacterizing themselves as “datacompanies.”

    There is an infinite opportunity for organizations toeffectively

    leverage and unlock the value inherent in their datarepositories.

    Companies that deploy artificial intelligence to derivemeaningful

    insights from their data holdings will be the successfulinnovators

    of tomorrow. But to achieve true success, organizations must

    implement the guardrails needed for responsible data use, asthe

    long-term sustainability of any enterprise is predicated ontrust.

    For data companies, the respectful and ethical treatment ofdata

    has become a core feature of any trust model.

    ADAM KARDASH

    Chair, Privacy and Data

    Management, Co-Leader of

    AccessPrivacy, by Osler

    PATRICIA KOSSEIM

    Counsel, Privacy and Data

    Management, Co-Leader of

    AccessPrivacy, by Osler

    The concept of data ethics is still in its formative stages andrequires active, informed, and multi-stakeholder discussion.Integrate.ai should be commended for developing this framework,which will help facilitate a structured conversation about theethical considerations and broader economic and social impacts ofAI data initiatives.

    Foreword

  • Responsible AI in Consumer Enterprise | 4

    The field has been around for a long time, but a phase shift hasoccurred over the past five years thanks to faster computation,smarter algorithms, and, most importantly, exponential growth indata.

    The subfield of AI having the greatest impact in the enterpriseis machine learning, software systems that learn from data andexperience. As Amazon CEO Jeff Bezos said in his 2017 letter toshareholders: “Over the past decades computers have broadlyautomated tasks that programmers could describe with clear rulesand algorithms. Modern machine learning techniques now allow us todo the same for tasks where describing the precise rules is muchharder.”

    But machine learning does more than automate existing businessesprocesses: It changes how businesses form and strengthenrelationships with customers. Using data and machine learning,businesses can turn every interaction into an opportunity to learnwhat people want and value. At a macro level, machine learning canoptimize margins, directing spend and human resources to thosecustomers where outreach and engagement would lead to the highestreturn.

    There are risks. AI requires that enterprises use customer datain new ways, expanding responsibilities to customers to includeappropriate data use. People feel shocked when they learn thattheir sensitive information was leaked. Suspicious when they sensebusinesses want to manipulate their behavior. Powerless when anautomated system denies them a product without any explanation forwhy. Trust is not a constant: it is earned over years and lost inan instant.

    Executive leadership is ultimately responsible for striking theright balance between business risk (both legal or reputational)and opportunity. Leaders need a clear mental model for what AI canand cannot do and a means to effectively arbitrate between businessand

    risk stakeholders to make the right decisions for the business.This is increasingly difficult in a world where major technologyadvances like AI challenge existing decision-making models.

    This framework presents the privacy, security, and ethicschoices businesses face when using machine learning on consumerdata. It breaks things down into the various small decisions teamsneed to make when building a machine learning system. It is anagile approach to ethics and risk management that aligns with agilesoftware development practices. Businesses waste time if governanceor ethics reviews start after systems are built. When done well,accountability quickens rather than slows innovation: business andrisk teams need to make contextual calls about what constraints arerequired when and clearly define desired business outcomes. Thescientists’ job is to apply the best algorithms to optimize forthese goals. There’s no silver bullet. Contextual judgment callsearly on can move mountains.

    This framework is neither a regulatory compliance compendium noran exhaustive list of risk management controls. It is a tool tohelp businesses applying AI to think about ethics and riskcontextually. It provides detailed insights for implementationteams and high-level questions for executive leadership.

    Expanding governance to include ethics can change employeemindsets towards governance and compliance. Ethics ignite valuesand empathy, the things that make us human and motivate us to dogood work. Sustainable innovation means incentivizing riskprofessionals to act for quick business wins and showing businessleaders why fairness and transparency are good for business.Building for accountability will force cross-functional teams toempathize with one another and communicate better. This alone willbe a win.

    Artificial intelligence (AI) may be the biggest and mostdisruptive technology advance we see in our lifetimes.

    Executive Summary

  • Responsible AI in Consumer Enterprise | 5

    Operationalizing ethics starts with breaking down how machinelearning systems are built and how they work. The

    framework uses the different steps in the system-developmentprocess as its organizing principle and localizes ethics

    and governance questions so they can be addressed quickly and inparallel with agile development.

    Deciding to cut a project early on because it is too risky orposes ethical concerns frees teams up to focus on other

    things (and practice their values in the process). Business,risk, science, and technical teams need to communicate

    continuously to ensure scientists optimize for the right set ofconstraints and goals and business teams understand

    what’s possible and what’s not possible. Doing ethics up frontcan open up the creative potential of your business.

    Guiding Principles

    The framework starts with our guiding principles, the

    intuitions everyone in your business, including executive

    management, should internalize to inform risk-based

    thinking and ethical decisions.

    People, Processes, and Communication

    Next come dos and don’ts about people, processes, and

    communication recommended to make ethics efforts

    successful. Use this as a checklist to think about your

    team and organizational structure.

    How Machine Learning Systems Work

    After is an overview of how machine learning systems

    work and some common machine learning applications

    in consumer enterprise. Use this like a glossary to align

    on definitions and level set expectations.

    Framework Summary

    A framework summary follows, breaking down

    the different steps in the machine learning system

    development process and indicating the jobs to be

    done and ethical questions to be considered at each

    step.1 Rely on this table as your legend, map, and guide.

    Some readers may only ever use this table.

    Context at Each Phase

    The body of the document provides further questions

    and additional context at each phase in the machine

    learning system development process. Privacy, security,

    fairness, explainability, and transparency issues are

    considered at each phase, including anecdotes and

    examples. The framework is systematic, but if a given

    category (e.g., security) is not relevant at a given phase,it

    is left out. Comprehensive guidance on security, privacy,

    compliance, or legal risk management issues are out of

    scope, but footnotes include references to additional

    resources. Use this to think deeper about a particular

    topic and to guide questions and decisions.

    This framework is designed to operationalize ethics in machinelearning systems.

    1 This is a high-level outline designed to focus attention onethics and risk. For further information about machine learningworkflows, we recommend the Georgian Partners Principles of AppliedArtificial Intelligence white paper and the O’Reilly DevelopmentWorkflows for Data Scientists eBook.

    The content in this document does not constitute legal advice.Readers seeking to comply with privacy or data protectionregulations are advised to seek specific legal advice fromlawyers.

    How to Use This Framework

  • Responsible AI in Consumer Enterprise | 6

    This framework is not a list of procedures and controls. It’s atool to think critically about privacy, security, and ethics.

    It will help you ask the right questions to the right people atthe right time, but you will have to assess risks and make

    tradeoffs. It’s helpful for everyone in the business to sharethe same intuitions around what matters.

    Responsible AI Principles:• In standard practice, machinelearning assumes the future will look like the past. When the pastis unfair or biased,

    machine learning will propagate these biases and enhance themthrough feedback loops. If you want the future

    to look different from the past, you need to design systems withthat in mind. You can’t just let the data guide you.

    Executive leadership should decide what future outcomes thebusiness wants to achieve. These include fairness,

    not just profit.

    • The outcomes businesses want to optimize for are often hard tomeasure or occur far in the future (e.g., customer

    lifetime value). Businesses therefore resort toeasier-to-measure proxies that stand in for desired outcomes.Be

    clear about what these proxies do and don’t optimize. You maylearn they exacerbate bias or have downstream

    consequences that conflict with values or goals.

    • All of your customers are individuals. Representing them asdata points necessarily transforms them from people

    into abstractions. When you deal with abstractions andgroupings, you run the risk of treating humans unethically.

    • Beware of correlations that mask sensitive data behind benignproxies. For example, postal code/zip code is often

    a proxy for ethnic background. If your machine learning systemuses location in decisions, you may end up treating

    different ethnic groups differently.

    • Context is key for explainability and transparency. Systemsthat decide who gets a credit card or loan require

    more scrutiny than systems that personalize marketing offers.Business and risk teams should assess context and

    communicate required constraints to technology teams.

    • Privacy is not just about personal data, notices or consentforms, or a set of controls to minimize data use. It is

    about appropriate data flows that conform to social norms andexpectations. Map these flows and ask if people

    would be surprised to learn how their data has been used.

    • Accountability is a marathon, not a sprint. Once inproduction, machine learning systems often make errors on

    populations that are less well represented in training data.Develop a plan to catch and fix these errors. “Govern

    the optimizations. Patrol the results.”2

    • There is no silver bullet to responsible AI. It takes criticalthinking and teamwork. Step outside the walls of your

    organization and ask communities and customers what matters tothem.

    2 Weinberger, David. “Optimization over Explanation: Maximizingthe benefits of machine learning without sacrificing itsintelligence.”https://medium.com/berkman-klein-center/optimization-over-explanation-41ecb135763d

    Guiding Principles

  • Responsible AI in Consumer Enterprise | 7

    Enterprises don’t change overnight. The only way to build anethics muscle is to execute a project large enough to

    matter (not a research project) and small enough to permitlearning without disruption (not a two-year data lake

    migration). Practice will reveal what works and what doesn’twork for your culture.

    Here are some dos and don’ts for people, processes, andcommunication to execute ethics in practice:

    People, Processes, and Communication

    DO

    Business and risk teams learn-by-doing through work on a machinelearning project

    Put a privacy stakeholder on a machine learning project workingclosely with development teams. Seek vendors or consultants whowill push your team to learn and grow their skills.

    Engage cross-functional teams in ethics decisions throughoutsystem development

    Business goals and ethics checks should guide technical choices;technical feasibility should influence scope and priorities;executives should set the right incentives and arbitratestalemates.

    Use risk-based, contextual thinking to evaluate and prioritizetechnical controls

    You can’t do everything at once. Prioritize time and effortsbased on how significant consequences to your business and yourcustomers would be if something went wrong.

    Include diverse and external points of view in ethicsdecisions

    You minimize risk and find new opportunities by engagingcustomers and communities who will be affected by machine learningmodels.

    Insist that technical stakeholders communicate risks in plainlanguage

    Not everyone will be a scientist or security expert. Clearcommunication is required for critical thinking and judgment callsto take place.

    DON’T

    Wait to recruit talent with expertise in ethics and risk relatedto machine learning

    There are fewer business professionals with expertise in AI thanhighly-coveted technical researchers. Delaying ethics until youfind the right talent will stall innovation.

    Delegate an ethics review to a fixed part of the organization atthe end of a project

    Engaging unbiased reviewers without conflicts of interest is agood idea, but for ethics to live and breathe it must be an ongoingpriority for the teams that create the systems as well.

    Use one-size-fits-all checklists or rely on vague policies orprinciples

    Privacy and ethics risks vary between applications. Granularanalysis and critical thinking are required for ethics inpractice.

    Appoint an ethics committee that only includes executiveleadership

    This removes accountability from teams. Leadership can arbitratedecisions and set policies, but decentralized accountability iskey.

    Grant functional teams the authority to stop progress withoutexplaining why

    With ambiguity, it’s easier to say no than yes and use jargon tocover uncertainty. Risks should be tied back to business goals.

  • Responsible AI in Consumer Enterprise | 8

    Agile ethics is a process to operationalise values,iteratively

    identify and address ethical challenges of your innovations

    before you send them out in to the world, and adaptively

    refine processes so that as the capabilities of technology

    evolve, so does your ability to diagnose and prevent harm

    before it happens.

    Agile ethics is the explicit application of agile methods toethical

    assessment, adaptation, and learning that allows for a teamto

    mature its practices as it works at the bleeding edge. Itemploys

    agile methods to tackle ethical challenges, and inculcatesethical

    approaches within an agile development process.

    It is not:

    • Delegating ethical analysis to a fixed part of an organisation(compliance, research, corporate social responsibility, orotherwise)

    • Prescribing a fixed approach to ethics based on vague values(policies won’t cut it)

    • Encouraging your staff to reflect on ethics without designingmethods or allocating resources to aid them in the process

    • A bandaid for underlying problems with governance andleadership

    Agile ethics requires four things:

    • Decentralisation of critical, ethical thinking in anorganisation or team

    • Iterative development of process that supports decentralisedconsideration of the implications of any given idea

    • Dedicated staff time to manage the process

    • Inclusion of diverse and (when appropriate) externalvoices

    ALIX DUNN

    Executive Director, The Engine Room

    The Engine Room is a UK-based firm that helps activists,organisations, and other social change agents make the most of dataand technology.

  • Responsible AI in Consumer Enterprise | 9

    How Machine Learning Systems WorkBefore you start addressing theethics and risks of machine

    learning, it helps if everyone shares a common understanding

    of what machine learning systems do and how they work.

    This doesn’t mean that everyone needs to become a machine

    learning scientist and grasp the nuances of differentalgorithms.

    They just need some grounded intuitions to ask goodquestions.

    Machine learning systems create useful mappings between

    inputs and outputs. The mappings, called models, are

    mathematical functions, equations of the form y = mx + b,

    where x is an input and y is an output (just that theequations

    can be much more complicated!). You could use hand-written

    rules to define those mappings, but rules take a lot of timeto

    write and usually don’t handle a lot of cases.3 With machine

    learning, computer programmers no longer write and update

    the mappings between inputs and outputs: computers learn

    these mappings from data. So, in y = mx + b, the computer

    learns what value “m” and “b” should be after seeing lots ofx’s

    and y’s. When the system is presented with new inputs ithasn’t

    seen before, it uses the mappings it’s learned to make auseful

    guess about the corresponding output. These mappings aren’t

    certain, and they don’t always generalize perfectly to newdata.

    Most machine learning applications boil down to making a

    prediction about the future (How likely is it that thisindividual

    will become a profitable customer?) or classifying data

    into useful categories (Is this email spam? Is this cellphone

    stationary or in transit?). Emerging systems that do thingslike

    schedule meetings, make phone calls, or write emails on our

    behalf can output a range of possible outputs rather thanjust

    an output with a clear right or wrong answer (like correctly

    saying what object is in an image) or a strict binarydecision

    (Will this customer churn or not?).

    Common machine learning applications in consumer enterprise

    Recommendation Systems Compare actions of consumers to infersimilar taste or suggest affinity between consumers and productsbased on attributes and actions

    Audience SegmentationSeparate consumers into groups that looklike one another in a way that is relevant for marketing or productperformance

    PersonalizationModify the experience of a product, marketingmessage, or channel to best resonate with a consumer at a scale toolarge for human teams to execute

    ChatbotsHelp customers answer questions, resolve problems, oridentify the right product mix to redirect human resources tohigher-value interactions that require judgment

    Risk AssessmentsModify offer and pricing on an insurance orbanking product according to predicted risk or likelihood todefault

    Anomaly DetectionIdentify a shift in customer behavior thatcould signal opportunity for upsell or risk of churn, or a shift innetwork or system behavior that could signal malicious activity

    Anti-money Laundering and ComplianceIdentify suspicious behavioror attributes and automate compliance reporting workflows usingnatural language generation

    Data ProductsUse algorithms to identify useful insights aboutconsumer behavior that are packaged and sold to other businessesfor targeted marketing

    3 The game of Go, for example, has more than 10⁸⁰ possible gameoutcomes. That is more possible games than there are atoms in theuniverse! It would take a prohibitively long amount of time for acomputer programmer to encode all the possible combinations byhand; machine learning systems can learn useful game strategiesfrom data of past human players (or, in the recent AlphaGo Zerosystem, through iterative self-play).

  • Responsible AI in Consumer Enterprise | 10

    When machine learning is incorporated into a business process,businesses must design how to transform a model output into anaction. Feedback loops happen when businesses keep track of thedifference between expected versus actual outcomes and use thisdifference to improve prediction accuracy over time.

    Consider the following example. Kanetix.ca, an online insuranceaggregator, uses the Integrate.ai platform to guess how likelysomeone is to purchase an insurance product and then surfaceincentives in real time to customers who could use a nudge. Theirbusiness goal is to focus marketing spend where it will have themost impact: that is, on customers who are decently likely to buybut who aren’t entirely sure.

    The input is information about a customer, web behavior, andinformation the customer enters into forms (like household size,age, kind of car, etc.). The input is sent to the machine learningmodel, which has been trained on historical data about customerpurchases, defining mappings between customer attributes andpurchases. The model outputs a score—a number in the range of 0 to1—of the customer’s likelihood to convert. This score is thenresolved into an action: do we surface an incentive or not? Thesystem keeps track of whether the expected outcome (guess that theperson will convert) materialized as the actual outcome (data thatthe person converted) and updates the model with this newinformation.

    Ethics and risk questions arise across the machine learningsystem workflow. Say your system automates decisions

    on granting people housing loans. Does your historical datacreate a mapping that frequently denies loans to black

    people? Could a malicious attacker reverse engineer your modelto access sensitive personal data? Can you explain

    why you denied someone a loan? If the score changes over time,can you reconstruct old mappings that have been

    updated and replaced?

    The rest of this framework examines these questions in detail,breaking them down according to the different tasks that

    go into building a machine learning system.

    Incentives are costly: Optimizing their eectiveness focusesspend on those individuals for whom an incentive will changebehavior

    The model is retrained based on results

    Prediction Did the actual outcome meet the expected

    outcome?

    BUILD THE MODEL

    1

    TRANSLATE PREDICTION INTO ACTION

    2

    PROGRAM RULES INTO API

    3

    SCORE & LEARN

    4

    Conversion likelihood.75

    Give the individual an incentive2

    Do not give the individual an incentive1

    Personalized web interface

  • Responsible AI in Consumer Enterprise | 11

    The Responsible AI Framework

  • Responsible AI in Consumer Enterprise | 12

    ML system development process

    Feedback loop:the model improves once it is in production

    Framework SummaryThe responsible AI framework breaks down thesteps used to build a machine learning system and highlightsprivacy,

    security, and ethics questions teams should consider at eachstep. Inspired by Privacy by Design, it is characterized by

    proactive rather than reactive measures to privacy and ethics,and embeds critical thinking and controls into the design

    and architecture of machine learning systems. View this as anagile process with multiple iterations and decision points,

    not a waterfall process that plans everything in advance. Youmay discover you want to cut a project because you lack

    sufficient training data, require greater certainty to fosteradoption, or have identified ethical concerns. Learn that

    quickly and free up resources to do something else. Rememberthat cross-functional teams should participate in most

    meetings (or at least have regular check-ins) throughout theprocess, in particular during the scoping phase.

  • Responsible AI in Consumer Enterprise | 13

    1 Problem Definition & Scope

    • Map current business process

    • Identify where machine learning system adds value or altersprocess

    • Define inputs, outputs, and what you are optimizing for

    • Measure baseline performance and expected lift

    • Analyze a user flow to understand how data is collected andwhere users hesitate on what to input

    • Decide whether this will be a fully-automated orhuman-in-the-loop system

    • Interview users and apply human-centric principles tounderstand their experience

    • Design how model outputs will translate into insight or actionfor internal users/external consumers

    • Conduct a data census to identify what data you have and whatdata you need

    • Procure second- and third-party data sets

    • Align machine learning training needs with data retentionschedule

    • Format and process the data to prepare it for machine learningalgorithms

    • Pair subject matter experts with scientists to help understanddata and features that matter for predictions

    • Experiment with various algorithms to verify the problem canbe solved and select the approach that performs best

    • Test model performance on reserved test data set to verifyfunctionality beyond training set

    • Integrate model outputs into business process

    • Capture data on outcomes and provide feedback back to thesystem

    • Define model retraining frequency (batch or real-time) and howscientists evaluate future model changes

    • Monitor system for failures or bugs and update coderegularly

    • Measure and report on results

    • How could your system negatively impact individuals? Who ismost vulnerable and why?

    • How much error in predictions can your business accept forthis use case?

    • Will you need to explain which input factors had the greatestinfluence on outputs?

    • Do you need personally identifiable information (PII) or canyou provide group-level insights?

    • How can you make data collection procedures transparent toconsumers?

    • Will the formats you use to collect data alienate anyone?

    • How will you enable end users to control use of theirdata?

    • Should you make it clear to users when they engage with asystem and not a human?

    • How will you manage the provenance of third-party data?

    • Who are the underrepresented minorities in your data set?

    • If a vendor processes your data, have you ensured it hasappropriate security controls?

    • Have you de-identified your data and taken measures to reducethe probability of re-identification?

    • Will socially sensitive features like gender or ethnicbackground influence outputs?

    • Are seemingly harmless features like location hiding proxiesfor socially sensitive features?

    • Does your use case require a more interpretable algorithm?

    • Should you be optimizing for a different outcome than accuracyto make your outcomes fairer?

    • Is it possible that a malicious actor has compromised trainingdata and created misleading results?

    • Can a malicious actor infer information about individuals fromyour system?

    • Are you able to identify anomalous activity on your systemthat might indicate a security breach?

    • Do you have a plan to monitor for poor performance onindividuals or subgroups?

    • Do you have a plan to log and store historical predictions ifa consumer requests access in the future?

    • Have you documented model retraining cycles and can youconfirm that a subject’s data has been removed from models?

    Data Collection &

    Retention

    Data Processing

    Model Prototyping &

    QA Testing

    Deployment, Monitoring & Maintenance

    Design2

    3

    4

    5

    6

    Step Jobs to Be Done Risk & Ethics Questions

  • Responsible AI in Consumer Enterprise | 14

    Like any initiative, machine learning projects start withideation and project evaluation, including assessments of

    technical feasibility, scope, desired outcomes, and projectedreturn on investment. Don’t underestimate the importance

    of this work: there’s a fallacy in thinking that beingdata-driven starts with finding insights in data. It starts withthe

    thinking that goes into defining a rigorous hypothesis that canbe explored using mathematical models. Subject matter

    experts bring valuable information to the table and can importwhat they know into system and process design to get

    to results faster. Machine learning systems are tools tooptimize against a set of defined outcomes; it’s up to humansto

    define which outcomes to optimize for.

    While an AI ethics assessment may seem like an entirely newprocess, ethics simply refers to norms of

    behavior within a product or service. As a result, AI ethicsassessments should focus on the implications of

    machine learning on decision making, KPIs, transparency, trustand ultimately the customer experience as

    a whole.

    Many enterprise machine learning applications will not raise newprivacy issues (e.g., automating contract

    due diligence with natural language processing). Applicationsthat collect data directly from consumers

    should be subject to a privacy review. Technical stakeholdersshould opine on whether the system needs

    granular, personally identifiable information (PII) to functionoptimally, and how system performance

    would be impacted if PII were replaced with aggregates. Forexample, this might entail a tradeoff

    between offers that are personalized to each individual versusoffers tailored to consumer segments that

    share common attributes.

    Conduct a privacy impact assessment (PIA) when you start aproject to align on what’s at stake.4 You may

    need to revise your PIA template to include risks related toinferred traits about individuals, not just PII

    (see section on data processing). Take a risk-based approach tomanaging PIAs with third-party vendors,

    focusing more rigorous review on vendors with higher businessrisk.

    Problem Definition & Scope

    PRIVACY

    4 Multiple privacy regulators have resources and guidance aroundprivacy impact assessments. We recommend Canadian companies startwith resources from the Office of the Privacy Commissioner ofCanada:https://www.priv.gc.ca/en/privacy-topics/privacy-impact-assessments.

    SUSAN ETLINGER

    Industry Analyst, Altimeter

    1

  • Responsible AI in Consumer Enterprise | 15

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    When scoping your use case and beginning to design your system,apply the principles of Privacy by Design, a framework developed byDr. Ann Cavoukian and recognized as an international standardoperating in 40 languages. Privacy and Data Protection by Designare the underpinnings of new regulations like GDPR.

    Principle 1: Proactive not reactive: preventative notremedial

    The Privacy by Design (PbD) framework is characterized by thetaking of proactive rather than reactive

    measures. It anticipates the risks and prevents privacy invasiveevents before they occur. PbD does not

    wait for privacy risks to materialize, nor does it offerremedies for resolving privacy infractions once

    they have occurred—it aims to identify the risks and prevent theharms from arising. In short, Privacy

    by Design comes before the fact, not after.

    Principle 2: Privacy as the default setting

    We can all be certain of one thing—the default rules! Privacy byDesign seeks to deliver the maximum

    degree of privacy by ensuring that personal data areautomatically protected as the default in any given

    IT system or business practice. If an individual does nothing,their privacy still remains intact. No action

    is required on the part of the individual in order to protecttheir privacy—it is already built into the

    system, by default.

    Principle 3: Privacy embedded into design

    Privacy measures are embedded into the design and architectureof IT systems and business practices.

    These are not bolted on as add-ons, after the fact. The resultis that privacy becomes an essential

    component of the core functionality being delivered. Privacy isthus integral to the system, without

    diminishing functionality.

    Principle 4: Full functionality: positive-sum, not zero-sum

    Privacy by Design seeks to accommodate all legitimate interestsand objectives in a positive-sum “win-

    win” manner, not through the dated, zero-sum (either/or)approach, where unnecessary trade-offs

    are made. Privacy by Design avoids the pretense of falsedichotomies, such as privacy vs. security,

    demonstrating that it is indeed possible to have both.

    Principle 5: End-to-end security: full lifecycle protection

    Privacy by Design, having been embedded into the system prior tothe first element of information

    being collected, extends securely throughout the entirelifecycle of the data involved — strong security

    measures are essential to privacy, from start to finish. Thisensures that all data are securely collected,

    used, retained, and then securely destroyed at the end of theprocess, in a timely fashion. Thus, Privacy

    by Design ensures cradle to grave, secure lifecycle managementof information, end-to-end.

    PRIVACY

  • Responsible AI in Consumer Enterprise | 16

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Principle 6: Visibility and transparency: keep it open

    Privacy by Design seeks to assure all stakeholders that whateverthe business practice or technology

    involved, it is in fact, operating according to the statedpromises and objectives, subject to independent

    verification. The data subject is made fully aware of thepersonal data being collected, and for what

    purpose(s). All the component parts and operations remainvisible and transparent, to users and

    providers alike. Remember, trust but verify!

    Principle 7: Respect for user privacy: keep it user-centric

    Above all, Privacy by Design requires architects and operatorsto keep the interests of the individual

    uppermost by offering such measures as strong privacy defaults,appropriate notice, and empowering

    user-friendly options. The goal is to ensure user-centredprivacy in an increasingly connected world.

    Keep it user centric.

    Privacy by Design is a framework that restores personal controlover one’s data to the individual to

    whom the data pertain. There are two essentials to Privacy byDesign: It is a model of prevention –

    PbD is predicated on proactively embedding privacy-protectivemeasures into the Design of one’s

    operations, in an effort to prevent the privacy harms fromarising. It also calls for a rejection of zero-

    sum, win/lose models: It calls for privacy AND data utility;privacy AND business interests; privacy

    AND AI: positive gains must be obtained on both sides: win/win!That is the essence of Privacy by

    Design.

    ANN CAVOUKIAN

    Distinguished Expert-in-Residence, Ryerson University

    PRIVACY

    SECURITYSecurity is not an absolute. There will always be somerisk. The goal is to reduce risk to an acceptable level

    to the business and have a plan to contain and mitigate anyincidents that occur. A risk-based approach

    to security focuses resources on information or technical assetsthat are critical to the business first and

    analyzes threat, consequence, and vulnerability to prioritizeefforts. A variety of mathematical models

    are available to calculate risk and to illustrate the impact ofincreasing protective measures on the risk

    equation.5 As with PIAs, vendors working with more sensitivedata should be subjected to more rigorous

    review and standards than those with lower risk and impact tothe business.

    5 The ISO/IEC 27000 family of standards on information securitymanagement systems is a widely-adopted framework for conductingrisk assessments and evaluating holistic controls.

  • Responsible AI in Consumer Enterprise | 17

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    There are three ethics issues to consider during the scopingphase.

    First, give customers a voice when considering impact (versusfocusing success metrics on the bottom

    line). Be prepared to address what may later be conflictingmetrics. For example, a business may deploy

    a machine learning tool with the expected goals to increasecustomer satisfaction and reduce call

    time to service more calls. This masks a hidden assumption thatcustomers want short calls. Short and

    sweet conversations may appeal more to busy professionals thanto older customers who enjoy friendly

    conversation. A diverse stakeholder group should participate ina design session prior to launching a

    product to consider a broad set of potential outcomes andcustomer experiences.

    Next, evaluate whether the system will “produce legal effectsconcerning [individuals impacted by the

    system] or similarly significantly affect [individuals].”6 TheEuropean General Data Protection Regulation

    (GDPR), in effect since May 25, 2018, stipulates that, in usecases with significant impact, “data subjects”

    have a right to “obtain an explanation of the decision reachedby an automated system and to challenge

    the decision.”7 These could be things like receiving a line ofcredit, receiving a home loan, being recruited

    for a job, receiving an insurance quote, etc. Transparencyaround targeted marketing is covered by the

    GDPR but does not fall under the same responsibilityrequirements. Break down your business process

    to identify what your model is doing locally. For example, aretail bank’s end-to-end system to sell credit

    cards includes some moles that should be explainable (e.g.,which factors determine loan eligibility) and

    others may not need be (e.g., which individuals are most likelyto purchase a product). Breaking things

    down like this can help overcome fears about the black box.

    6 The General Data Protection Regulation (GDPR), article 22:https://gdpr-info.eu/art-22-gdpr/. This framework uses GDPR as anexample regulation guiding data privacy and data processing. It isthe most recent example of legislation on the topic. Citations toGDPR are provided as context to help you shape governance efforts.This framework does not provide legal advice on compliance.

    7 GPDR, recital 71 to article 22:https://gdpr-info.eu/recitals/no-71/. Note that this requirementalso exists in the European Union Data Protection Directive from1995.

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Break down explainability into three different levels whenevaluating what matters for your business

    Explain the intention behind how your system impactscustomers

    Explain the data sources you use and how you audit outcomes

    Explain how inputs in a model lead to outputs in a model

    1

    2

    3

    ETHICS

  • Responsible AI in Consumer Enterprise | 18

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Finally, get as clear as possible on what your data and outcomesactually optimize. Quantified machine

    learning systems rely on easy-to-measure proxies of complex,real-world events, and ethical pitfalls

    arise in the gap between the complexity of real life and thesimplifying assumptions a system requires.

    Consider the example of the COMPAS recidivism prediction system,intended to guide justice officials

    to define an optimal sentence. Users naively interpreted thatthis system gave them information about

    recidivism likelihood as a function of sentence length. But thesystem actually shows likelihood to be

    convicted, given the data it analyzes and the limitations ofwhat it can measure.8 Reframed like this, it’s

    evident that the system would expose systematic bias givenhistoric incarceration trends in the United

    States.

    Clearly identifying what information exists and is lacking fromproxy metrics also helps businesses

    improve machine learning system performance. The inference gapbetween a proxy and a real, discrete

    outcome signal could equate to millions of lost revenue for thebusiness.

    monitori

    ng &

    main

    ten

    anc

    e

    SHORT-TERMProspect conversion

    LONG-TERMFrequent use and revenue impact

    Feedback loopsOptimization for short-term or long-termoutcomes

    2018

    2050AND BEYOND

    8 For a complete review of the system, seehttps://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.For further analysis of what the system actually optimizes for,see:https://medium.com/@yonatanzunger/asking-the-right-questions-about-ai-7ed2d9820c48.

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    ETHICS

    Key Questions for Problem Definition & Scope

    • Is everyone aligned on what success does and does not looklike for the use case?

    • How could your system negatively impact individuals? Who ismost vulnerable and why?

    • How much error in predictions can your business accept forthis use case?

    • Is it important to explain why a system made a decision aboutan individual?

    • Do you need PII or can you work with aggregateinformation?

    • Have you performed risk-based privacy and security assessmentsto identify what controls matter most?

    • Have you mapped out the end-to-end business process? Are youclear on what the model optimizes?

  • Responsible AI in Consumer Enterprise | 19

    The focus here is on the system front-end: the tangibleinterface consumers or internal users touch and use. Machine

    learning systems can be completely automated, where a model’soutput automatically plugs into a website interface

    or app, or have a human in the loop, where the system providesinformation to an internal user who then uses this

    information to help make a decision or sends feedback to helptrain the system and improve system performance.

    Different architectures raise nuanced privacy, security, andethics issues.

    Making privacy transparent to users and providing means for themto control how their data is used is a

    critical design question. Designing for consent is not trivial.The legal and academic privacy communities

    currently lack consensus regarding the utility of individualconsent for personal data: experts like Helen

    Nissenbaum (Professor of Information Science, Cornell Tech) andMartin Abrams9 recognize that data

    processing has passed beyond the ability of most people tounderstand and, in turn, provide truly

    informed consent on use. The dilemma businesses face cuts intothe heart of innovation with machine

    learning: should you collect as much as possible to preserveoptionality for future innovation or as little

    as possible to interpret “minimum use” strictly? Should youmanage privacy by explicit permission (you

    may do this and only this) or exclusion (you may do anything butthat)?

    Design

    PRIVACY

    9 Executive Director of the Information AccountabilityFoundation, Martin Abrams has recently researched the effectivenessof ethical data assessments on Canadian businesses and upholds thecritical importance of neutrality in conducting assessments:http://informationaccountability.org/author/tiaf01/.

    API that directly integrates into front-end

    API serves model prediction to an internal employee, whoprovides feedback to help

    train the system

    Automated architecture

    Human-in-the-loop architecture

    FRONT-END EXPERIENCE

    FRONT-END EXPERIENCE

    2

  • Responsible AI in Consumer Enterprise | 20

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    As with everything in this framework, there is no silver bulletand the decision is contextual: privacy teams need to analyzeelements like what personal information is being collected, withwhich parties personal information is being shared, for whatpurposes personal information is collected, used or disclosed, andwhat the risk of harm and other consequences if misuse were tooccur.10

    A new challenge machine learning poses is to give users someinsight into what companies can infer about them based on theirbehavior, or what “profiles,” to use the language in GDPR, thecompany creates about them. Say a user opts out in the future. It’snot enough to remove their data from a customer relationshipmanagement (CRM) system or other system of record: some models mayneed to be retrained to remove inferred profiles about the useralso (see the section on deployment, monitoring, and maintenancefor further details).

    Privacy policies should be easy to find and understand; thatmeans, translated from legalese into words or product features thatmake this matter. They shouldn’t be tucked away but pushed toconsumers at key points of engagement with a system as a gateway tocontinued use. Policies should be general enough to withstandchanges in practice but specific enough to build trust.

    The theory of contextual integrity proposes that privacy isabout appropriate data flow, flow that

    conforms with contextual social norms. It’s not a proceduralexercise of providing notice or getting

    consent. What kind information is being sent by whom and to whommatters.

    PRIVACY

    10 The Office of the Privacy Commissioner suggest these asfactors for analysis in their 2018 “Guidelines for obtainingmeaningful consent”:https://www.priv.gc.ca/en/privacy-topics/collecting-personal-information/consent/gl_omc_201805/#_determining.

    HELEN NISSENBAUM

    Professor of Information Science, Cornell Tech

    The Office of the Privacy Commissioner of Canada suggests thefollowing principles for obtaining meaningful consent:

    1. Emphasize key elements when

    doing a contextual assessment of

    consent requirements

    2. Allow individuals to control the

    level of detail they get

    and when

    3. Provide individuals with clear

    options to say ‘yes’ or ‘no’

    4. Be innovative and creative

    5. Consider the consumer’s

    perspective

    6. Make consent a dynamic and

    ongoing process

    7. Be accountable: Stand ready to

    demonstrate compliance that

    is packaged and sold to other

    businesses for targeted marketing

    Des

    ign

  • Responsible AI in Consumer Enterprise | 21

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    SECURITY

    ETHICSDesign choices on how models structure and collect datacan have emotional impact on users and

    impact downstream quality.

    Data users have to actively input into your system can becollected in drop-down lists, structured fields,

    or unstructured fields. Most of the time, designers structurefields to facilitate downstream analysis and

    quality. This choice can alienate users as fields revealimplicit cultural assumptions. Consider gender.

    Some companies still use male and female; as of 2014, Facebookpresented users with 71 gender options.

    Individuals who self identify with non-binary categories orresist gender identification will be emotionally

    impacted by strict binary choices.11 These representations areonly exacerbated when systematized at

    scale in products.

    As regards data quality, be mindful of questions to which usersstruggle to provide accurate answers.

    Garbage into your system is garbage out of your system.Uncertainty in input data will propagate into

    models and impact downstream quality. Nuanced subject questionsrelated to insurance or banking are

    examples of information that, when input by a user, will likelybe unreliable.

    A/B testing can raise ethical questions if it involves users’emotional response. Consider the 2014

    experiment where Facebook tested whether posts with positive ornegative sentiment would affect the

    happiness levels of 689,003 users (inferred by what theyposted). Critics deplored the deliberate attempt

    to induce emotional pain for the sake of experimentation.12

    Google’s new Duplex system has kindled debates about front-endtransparency.13 Duplex makes phone

    calls to restaurants and other businesses on an individual’sbehalf. The system mimics human speech

    patterns naturally enough to fool real humans into thinking it’sa real person (saying things like “Mm-

    hmm”). These issues are so new that there is no consensus onwhen lifelike AI is morally acceptable and

    when it’s not. Pragmatists suggest system tweaks to communicatethat the system is a machine from the

    outset. The flip side of the equation is when users assume asystem is an automated AI when in fact

    humans are in the loop processing data to inform future systemintelligence (as with early versions of the

    scheduling agent x.ai or Facebook messenger).14 Consumers mayreveal more private information when

    they think only an abstract machine is watching.

    Security incidents become visible to consumers when servicesstop working as usual, they are forced

    to manage the aftermath of a breach, or malware infects othersystems they use. Designers should

    collaborate with security early to ensure interfaces for systemsdeemed high risk from the risk assessment

    include best-practice features like two-factor authentication.Anomalous activity that might indicate a

    breach should first be analyzed internally and communicated to auser as necessary. To instill additional

    trust, products can include features users can consult withhealth checks on various security measures.

    11https://www.telegraph.co.uk/technology/facebook/10930654/Facebooks-71-gender-options-come-to-UK-users.html12There are many analyses of the controversy. For example,https://techcrunch.com/2014/06/29/ethics-in-a-data-driven-world/.13There are many analyses of the controversy. For example,https://www.theguardian.com/technology/2018/may/11/google-duplex-ai-identify-itself-as-robot-during-calls.14https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies

    Des

    ign

  • Responsible AI in Consumer Enterprise | 22

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    ETHICSHuman in the loop and decision support

    Additional ethical issues should be considered withhuman-in-the-loop systems.

    As with automated systems, designers have to make choices abouthow to format data collection, be

    that in drop-down lists, unstructured feedback, or binarychoices like thumbs up/down, to get additional

    training data. While these choices architect input fromtrainers, there is still room for subjective bias to

    creep in via choices humans make. Data quality issues can arisefrom outsourced services, like Mechanical

    Turk, where individuals lack subject matter expertise. Finally,trainers risk importing their own biases in

    selecting labels and options to train the system. Thought shouldbe put into who is qualified to train a

    system and what kinds of biases they should be aware of inproviding labels and feedback.

    Next, systems can present varying levels of information abouttheir outputs to shape the kind of feedback

    provided. One option is to simply present the maximum likelyoutput, i.e., to minimize the information

    provided to an internal user. For example, a system couldprovide a retail bank’s call center agents with a

    list of prospects to call without indicating anything abouttheir score. Interfaces can be more informative.

    To continue with the example, a system could show a score orprovide information about what features

    influenced the score (less interpretable models won’t be able tosupport such clarity, so designers need

    to work with scientists to support desired functionality—see thesection about model prototyping for

    further information). Documentation should be provided tointernal users so they understand how to

    use the system and what it tells them: remember that machinelearning systems output scores based on

    the data they are trained on, not absolute probabilities. Thatsomeone ranked as 0.78 for propensity to

    convert on a scale from 0 to 1 in your system does not mean thatthey will do what’s predicted 78 percent

    of the time. Design choices can help users learn how tointerpret model outputs, but education will likely

    be required.

    Legal liability

    Clear legal precedent to define liability for injuries caused bymachine learning systems has yet to be

    defined. Early analysis, however, predicts that fully automatedsystems and human-in-the-loop decision

    support systems will be subject to different liabilityanalysis.

    Suppose a customer has been in some way injured by afully-automated machine learning system. One

    line of reasoning would say that the organization made in breachof a duty of care owed to the person

    who has been “injured” by the algorithm’s decision making. Whatcomplicates this argument, however,

    is that the law of negligence has always assumed humanlimitations on what is or isn’t “reasonable”:

    only “unreasonable” acts or omissions can attract liability. Canan objectively “rational” algorithm be

    “unreasonable”? If not, how can an organization that deploysthat algorithm be “unreasonable” in so

    doing? This paradox suggests that documentation around choicesin algorithmic design could become

    increasingly important. When working with a software vendor, theorganization may then have a claim for

    contribution or indemnity against the developer of thealgorithm, on a product-liability theory.

    Des

    ign

  • Responsible AI in Consumer Enterprise | 23

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    ETHICSWhere the decision is made by an algorithm with ahuman-in-the-loop, the liability then sits with the

    employee who makes the decision, not the system. The employermay be held vicariously liable on the

    basis of respondeat superior—or, what is rather un-woke-ly stillknown as “master-servant liability”.

    Legal theorists have looked to analogize autonomous systems tovarious areas of law (e.g., servant-

    master, animal husbandry, employee-employer, independent person,etc.). These analogies are likely

    imperfect given the level of intelligence of the system, theremoteness of human intervention, and the

    independence between system/human.15

    Engage your legal team when designing a system to addressliability.

    15 Many thanks to Carole Piovesan and Adam Goldenberg atMcCarthy Tétrault LLP for their input on this section.

    Key Questions for Design• What type of consent is required foryour system given contextual analysis and risk of harm?

    • How can you make data collection procedures transparent toconsumers?

    • Will the formats you use to collect data alienate anyone?

    • How will you enable end users to control use of theirdata?

    • Should you make it clear to users when they engage with asystem and not a human?

    • Are you introducing uncertainty into your system by askingquestions that are hard to answer?

    • Do internal users understand how to interpret the outputs ofyour system?

    Des

    ign

    TELUS is passionate about unlocking the promise that responsibleAI and data analytics have to offer.

    We recognize the enormous social benefit and economic value thatmachine learning has to offer,

    and we’re committed to working with academics, ethicists, datascientists and other thought leaders

    to ensure that we can deliver on the promise in a responsible,ethical manner that is respectful of

    privacy. To this end, our first priority is to earn and maintainour customers’ and team members’ trust.

    We are exploring a variety of techniques and strategies toaccomplish this goal, including working

    with experts in de-identification to produce useful data setsthat cannot be tied back to any individual

    person, enhancing our data governance model to enable us toproperly identify and assess the

    social and economic impacts of any AI initiative, and leveragingour Customers First culture to build

    an innovative, agile and, most importantly, responsible AIprogram. Through responsible AI, we can

    make the future friendly.PAMELA SNIVELY

    Chief Data & Trust Officer, TELUS

  • Responsible AI in Consumer Enterprise | 24

    One question you’ll face in applying machine learning is whetheryou’ll use only first-party data or also include public

    or private third-party data in your system. Don’t fall into thetrap of viewing this as a PII or no-PII question to satisfy

    compliance requirements: as Helen Nissenbaum shows, privacy iscontextual, and users get shocked when data you

    argue is public shows up in an unexpected context. For example,the Allied Irish Bank recently made headlines for spying

    on consumers when they include public social media data inmodels to determine mortgages.16 Compliance would say

    they were onside, but the activities still had reputationalrisk. It was about appropriate collection and appropriate flow.

    Many of the issues attributed to algorithmic bias start withdata collection: if you’ve historically engaged with a certain

    demographic population, you will have more information aboutthis group than other groups, skewing systems to

    perform better on well-represented populations. Solving thisstarts with the data, not the algorithms. The algorithms

    simply learn a mathematical function that does a good jobmapping inputs to outputs.

    How you collect and store data has privacy implications. Today’sage of cheap data storage and the

    internet of things means you can collect massive amounts ofinformation about series of events, be they

    what someone posts on Reddit or Twitter, GPS location, internetor set top box viewing data, you name

    it. Technical teams can make a choice to either collect allthose data points and process them in batches

    to train algorithms or treat the data like a stream, onlycollecting snapshots of trends over time.17 Using

    stream techniques, you never capture or store granular dataabout an individual, only approximations

    relevant for machine learning purposes. This lowers modelaccuracy, but there are techniques to bound

    errors to meet requirements of your use case.

    Your data governance team has likely viewed data retention as arisk and developed procedures to

    delete data over time to protect the business (while respectinglegal retention requirements). Machine

    learning teams will want historical training data to understandpast events for future predictions. For

    example, a bank may want to model consumer behavior during apast economic cycle that resembles

    current conditions; these needs may conflict with strictretention procedures. Have these discussions

    early.

    Data Collection & Retention

    PRIVACY

    16https://www.independent.ie/business/personal-finance/big-brother-aib-now-spying-on-customers-social-media-accounts-36903323.html17Example techniques include Bloom filters and cuckoo filters, asexplored in this blog post:http://blog.fastforwardlabs.com/2016/11/23/probabilistic-data-structure-showdown-cuckoo.html.For an in-depth technical review, we recommend Micha Gorelick andIan Ozsvald’s High Performance Python:http://shop.oreilly.com/product/0636920028963.do.

    3

  • Responsible AI in Consumer Enterprise | 25

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Next, as the recent Cambridge Analytica scandal showed, thereshould be clarity about how data

    collected directly from a few consenting individuals can be usedto make indirect inferences about many

    other, non-consenting individuals.18 Cambridge Analytica usedconnections in Facebook’s networked

    graph to make inferences about personality types of 80 millionindividuals from only 270,000 active

    survey participants. Lookalike inferences of this kind are coreto many personalization projects: decide

    what inferences you’ll allow on unknowing data subjects.Facebook also lacked rigorous methods to

    audit and verify that Cambridge Analytica complied with requeststo remove information from the

    network graph after services were suspended; verificationmethods should be technical, not good faith.

    A final thing to consider is data provenance. Data aggregatorscollect data from hundreds of sources

    and pull them all together to resell demographic information tocustomers. There are chains of

    interdependent liability between all the players in a datasupply chain. Review contracts with third-party

    vendors and data providers carefully to identify surprisingindemnity clauses that may indicate untoward

    data collection practices.

    PRIVACY

    18https://www.vox.com/policy-and-politics/2018/3/23/17151916/facebook-cambridge-analytica-trump-diagram

    Data

    Col

    lect

    ion

    & R

    eten

    tion

  • Responsible AI in Consumer Enterprise | 26

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Modern machine learning toolkits are largely

    in the cloud; Amazon Web Services, Microsoft

    Azure, and Google Cloud Platform are the three

    largest providers. Some enterprises, in particular

    in regulated industries, are still hesitant to host

    sensitive customer data in the cloud and opt to

    build systems internally or work with consultants

    that build on-premise systems. This can

    negatively impact the business’ ability to scale

    and govern machine learning systems.

    If you decide to put data in the cloud or work

    with a cloud-based software provider, there

    are numerous standards to inform vendor risk

    management programs, including the ISO/IEC

    27018 standard for managing PII in the cloud

    or regulations like the United States Health

    Information Portability and Accountability Act

    (HIPPA) Security Rule, which includes a security

    assessment tool largely following ISO/IEC 27001.

    Vendors should apply security best practices

    internally and constantly educate customers on how

    they are strengthening their security posture. We

    encourage clients to dive deeper into our controls,

    software security, and most importantly, have the

    discussions needed to gain and maintain their trust.

    That’s being accountable.

    SECURITYA few essential security controls to look for inthird-party vendors

    Encryption

    Data should be encrypted at rest

    and in transit. Encryption keys should

    be updated regularly to ensure that

    any vulnerabilities are limited to

    what was enciphered during a given

    key rotation. Algorithms should be

    256 bit or higher. There should be

    governance on who can access keys.

    Data Access

    Data should never end up on the

    personal laptops of workstations of

    a third-party vendor. Data should

    be housed in a clean room and only

    accessed on a need-to-know basis

    by vendor scientists and developers.

    Auditability

    The vendor should keep logs of

    scientist and engineer access to

    computing clusters, databases, and

    even rows and fields in databases,

    with means to detect anomalies as

    needed. Data flows across network

    perimeters should be monitored.

    Breach notification

    The vendor should have processes

    to identify a breach, conduct a risk

    assessment to understand impact,

    and notify impacted parties in

    accordance with regulations.

    CHRIS NELMS

    EVP, Trust & Security, PrecisionLender

    Data

    Col

    lect

    ion

    & R

    eten

    tion

  • Responsible AI in Consumer Enterprise | 27

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    If a product or service has historically been used by a certainsubpopulation, data will be skewed to

    accurately represent tastes, attributes, and preferences of thispopulation at the expense of others. In

    Automating Inequality, for example, Virginia Eubanks shows howthis impacted the performance of an

    automated system to predict instances of domestic child abuse.Data came from public health facilities

    in the United States, not private facilities. Given thestructure of the United States healthcare system, low

    income individuals tend to use public facilities while higherincome individuals use private facilities. As

    such, predictions were skewed towards behavior in lower incomefamilies, going on to systematize bias.

    Propensities of certain populations to engage with advertisingor fill in forms will similarly skew

    performance. Analyze if certain communities or populationsengage with your business more than

    others and work to figure out why.

    ETHICS

    At Scotiabank, we don’t see data governance as just a complianceexercise or a way to manage

    risk. For us, it’s a source of competitive advantage and a wayto deliver meaningful value to our

    customers, while maintaining their trust. Taking an ethicalapproach to AI is an essential part of that

    work. We see ourselves as custodians of our customers’ data andknow that our ability to protect it

    is intrinsically linked to the value and promise of ourbrand.

    MIKE HENRY

    Chief Data Officer, Scotiabank

    Key Questions for Data Collection & Retention• How will youmanage the provenance of third-party data?

    • Are there any underrepresented minorities in your dataset?

    • If a vendor processes your data, have you ensured it hasappropriate security controls?

    • Will your existing data retention schedules and proceduresimpact model training?

    • Do you need to store every data point or is it possible tomanage data as a stream?

    • Would people be surprised to see their data used in thiscontext?

    Data

    Col

    lect

    ion

    & R

    eten

    tion

  • Responsible AI in Consumer Enterprise | 28

    Data processing is the step where you prepare data for use inalgorithms. The core data privacy challenge relates to

    protecting privacy beyond PII. Focusing narrowly on PII, fieldsin databases like first and last names, social insurance

    numbers, or email addresses, is not sufficient to guaranteeprivacy. You have to expand risk to protect the possibility of

    a breach even when a data set has been scrubbed of PII. The coreethics issues relate to deciding what types of inferred

    features or profiles your organization feels are appropriate andidentifying tightly correlated features in data sets that

    can hide discriminatory treatment.

    Let’s examine both these issues using postal code.

    Consider this simplified example. Say your database includesinformation about an individual’s postal

    code and gender, and you combine this with another database thathas information about an individual’s

    age. You don’t have the name of the individual in eitherdatabase. Can you identify this individual? With

    what likelihood?

    As always, it depends. How many people live in the postal code?If it’s a dense urban highrise, there

    may be a lot; a rural hamlet, there may be just one person.19 Wecan continue this kind of analysis on

    each variable. Age might depend on the income level typical to abuilding: a location tailored to young

    professionals may have many 35-year olds, whereas a differentlocation may have a more varied age

    distribution. A postal code for a retirement home may skew mucholder. Having birth date versus age

    will quickly narrow a set to a few people.

    Data Processing

    PRIVACY

    19 This is why use of postal code is problematic in certainjurisdictions. “The cell size of five rule is the practice ofreleasing aggregate data about individuals only if the number ofindividuals counted for each cell of the table is greater than orequal to five.”https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf

    INDIVIDUAL RECORD

    PRIVACY AGGREGATE

    MODELName

    Address

    Postal code

    Email

    9wNSY7361nd

    8264jSiapq3dn3t07Whs

    a87rH3SGaD89qw8

    Da63ndS21hHa

    CRYP

    TOG

    RAPH

    ICHA

    SH

    Comparison and feedback lo

    op

    4

  • Responsible AI in Consumer Enterprise | 29

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Identification risks grow when data is released publicly orthird-party data is used to augment first-

    party data because third-party data might fill in gaps, leadingto an increased ability to reverse

    engineer an individual from a group. As such, enterprises shouldhave consistent practices for

    sharing data with third parties: if two startups have twodifferent views on people, each of which

    are private, but collaborate with one another, they’ll have keysto unlock identity.

    This is another area where the current best practice is to thinkcritically and apply a risk-based

    approach. The Information and Privacy Commissioner of Ontariorecommends the following

    process for a risk-based approach to de-identify data:20

    PRIVACY

    20 The full report includes further guidance on how to implementrisk-based de-identification:https://www.ipc.on.ca/wp-content/uploads/2016/08/Deidentification-Guidelines-for-Structured-Data.pdf

    Data

    Pro

    cess

    ing1 2

    6

    7 8 9

    3

    5 4

    Determine the release model

    public, semi-public, or nonpublic

    Classify variables

    direct identifiers and quasi-identifiers that can be used

    for re-identification

    Determine an acceptable

    re-identification risk threshold

    impact of invasion of privacy

    Calculate the overall risk

    data risk x context risk

    Measure the context risk

    for non-public data, considerthreats and vulnerabilities

    Measure the data risk

    calculate the probability ofidentification per row

    De-identify the data

    mask direct identifiers, modify equivalence classes, ensure riskbelow desired threshold

    Assess data utility

    consider the impact de-identification will have on

    system performance

    Document the process

    for compliance, trust, and transparency

  • Responsible AI in Consumer Enterprise | 30

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    The downside to de-identification is that it is not foolproof:there will be residual re-identification

    risk, which is why tolerance needs to be assessed and governedagainst.

    An alternative technique that provides theoretical guarantees isdifferential privacy, which

    modifies the data set in such a way that statistical featuresthat matter for a model are preserved,

    but it’s impossible to tell the difference between adistribution that contains and does not contain

    an individual. Protections can be added at various points in themachine learning pipeline, with

    tradeoffs of model performance and privacy guarantees: as we sawabove, the more questions you

    ask about an aggregate, the closer you get to an individual.Most differential privacy algorithms

    have a “privacy budget,” or number of queries they can supportbefore privacy guarantees

    lessen. Product management leaders need to consider thesetradeoffs during implementation.

    At this time, differential privacy is in production in companieslike Google, Facebook, Apple,

    and Uber, but has yet to become de facto best practice instartups or the enterprise. It is still

    relatively new and difficult to implement effectively. Otherprivacy techniques include one-way

    hash functions, which make a cryptographic mapping of input datathat cannot be reversed,

    and masking, which removes variables or replaces them withpseudonymous or encrypted

    information.

    PRIVACY

    Data

    Pro

    cess

    ing

    Information containing direct and indirect identifiers

    Information from which direct identifiers have been eliminatiedor transformed, but indirect identifiers remain intact

    Direct and known indirect indentifiers have been removed ormanipulated to break the linkage to real world identities

    Direct and indirect identifiers have been removed or manipulatedtogether with mathematical and technical guarantees to preventre-identification

    DEGREES OF IDENTIFIABILITY

    PSEUDONYMOUSDATA

    DE-IDENTIFIEDDATA

    ANONYMOUSDATA

  • Responsible AI in Consumer Enterprise | 31

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Feature engineering is the process of creating second-orderfeatures, or insights, relevant for a

    model from raw data. For example, in a model to predict customerchurn, a first-order attribute

    would be something like gender and a second-order attributewould be something like price

    sensitivity, inferred from a sequence of transactions. Eachtransaction doesn’t have much value,

    but the inference drawn from multiple transactions does. Decidewhat inferences your business

    will and will not permit for user segmentation andtargeting.

    Be mindful of proxy correlations when processing data. Removinga column for gender or

    ethnicity won’t guarantee that these factors are now absent froma model, as they can be tightly

    correlated to other features. For example, ethnic background isoften correlated to postal code

    given the tendencies of some ethnic groups to settle incommunities with people of similar ethnic

    backgrounds.

    ETHICS

    Data

    Pro

    cess

    ing

    Confidential Draft – Do Not Copy, Cite or Redistribute withoutPermission

    Trustworthy AI in Consumer Enterprise 21

    >80% White

    >80% Hispanic

    >80% BlackMajority Hispanic

    Majority BlackMajority AsianNo Majority

    Majority White

    Map of Chicago shows how postal code is often a proxycharacteristics for ethnicity

    Model Prototyping & Quality AssurancePrivacy SecurityEthics

    Selecting the best algorithm for system goals and business

    More complex models can provide stronger privacy guarantees

    Beware of fraudulent training data that compromises systemperformance

    Consider explainability as factor in model selection

    Define optimization goals to support future outcomes of ethicalpolicies

    Choosing the best model for a particular problem is not only atechnical question of identifying the algorithm that performs bestfor the job. Data and machine learning scientists should also takebusiness, ethical, and regulatory considerations into account tonot only select a model that works, but one that the business canput into production. Privacy

    MAP OF CHICAGO SHOWS HOW POSTAL CODE IS OFTEN A PROXY

    CHARACTERISTIC FOR ETHNICITY

  • Responsible AI in Consumer Enterprise | 32

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Key Questions for Data Processing

    • Have you conducted a risk assessment on your data set and madean informed choice on which privacy technique is

    best for your use case and maturity level?

    • Will socially sensitive features like gender or ethnicbackground influence outputs?

    • Are seemingly harmless features like location hiding proxiesfor socially sensitive features?

    • What psychological or behavioural inferences will your companyuse or ban for targeting or other predictions?

    Data

    Pro

    cess

    ing

  • Responsible AI in Consumer Enterprise | 33

    In this step, machine learning engineers experiment withdifferent algorithms to find the best algorithm for the job,train

    the model, and verify that the chosen model satisfiesperformance requirements (e.g., how accurate the model needs to

    be). Choosing the best model for a particular problem is notonly a technical question of identifying the algorithm that

    performs best for the job. Data and machine learning scientistsshould also consider business, ethical, and regulatory

    requirements when selecting algorithms.

    Sometimes teams turn to synthetic data to train models. Theprivacy argument is that the synthetic data

    can mimic the statistical properties relevant for modelperformance without using real data that could

    compromise privacy. Be careful: expect that the model won’tperform well in the real world—while a

    synthetic data set can mimic the statistical properties ofinterest in a real-world data set, they don’t

    overlap exactly, which can create performance issues.

    The performance of machine learning systems depends on trainingdata quality. If a malicious actor

    compromises training data, they can not only access sensitiveinformation or take down a system, but

    lead the system to produce the wrong outputs and behavedifferently than it was intended to. A benign

    example is an application like Spotify changingweekly-recommended songs based on activity from a

    different user than normal. A serious example is autonomousvehicles run amok due to a hack into GPS

    or visual control systems.

    Techniques to hack a machine learning algorithm can be verysubtle. Machine learning researcher

    Ian Goodfellow has focused on “adversarial examples thatdirectly force models to make erroneous

    predictions.”21 An adversarial example is data that is input fora model that has is modified with a small

    perturbation imperceptible to the human eye. The algorithm,however, can pick up on the perturbation

    and classify it as something else. You think the algorithm isworking, but it’s learning the wrong thing.

    Audit data scientist workstations for vulnerabilities,standardize tooling across your team, and apply

    rules-based access controls to minimize risk.

    Model Prototyping & Quality Assurance

    PRIVACY

    SECURITY

    21 See, for example,http://www.cleverhans.io/security/privacy/ml/2017/02/15/why-attacking-machine-learning-is-easier-than-defending-it.html.

    5

  • Responsible AI in Consumer Enterprise | 34

    Prob

    lem

    Defi

    nitio

    n &

    Sco

    pe

    Data

    Col

    lect

    ion

    & R

    eten

    tion

    Mod

    el P

    roto

    typin

    g &

    QA

    Tes

    ting

    Des

    ign

    Data

    Pro

    cess

    ing

    Dep

    loym

    ent,

    Mon

    itorin

    g &

    Main

    tena

    nce

    Addressing fairness requires that machine learning engineersmake a paradoxical move and optimize

    for a different goal than strict accuracy. Recall that accuracyassumes that the future will and should

    look like the past; you don’t want to replicate biasedhistorical trends, you need to change what you

    optimize for.

Responsible AI in Consumer Enterprise · artificial intelligence. Responsible AI in Consumer Enterprise | 2 Acknowledgements This framework has benefited from input and feedback from - [PDF Document] (2024)
Top Articles
Latest Posts
Article information

Author: Pres. Lawanda Wiegand

Last Updated:

Views: 6072

Rating: 4 / 5 (71 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Pres. Lawanda Wiegand

Birthday: 1993-01-10

Address: Suite 391 6963 Ullrich Shore, Bellefort, WI 01350-7893

Phone: +6806610432415

Job: Dynamic Manufacturing Assistant

Hobby: amateur radio, Taekwondo, Wood carving, Parkour, Skateboarding, Running, Rafting

Introduction: My name is Pres. Lawanda Wiegand, I am a inquisitive, helpful, glamorous, cheerful, open, clever, innocent person who loves writing and wants to share my knowledge and understanding with you.