Are the emerging generative AI products compatible with Australian privacy requirements?
Market Insight 2023年10月2日 2023年10月2日
Advances in artificial intelligence (AI) and in particular, generative AI products and their impact on business operations and individuals have dominated conversations both in the legal community and the public at large throughout 2023. Freely available AI services such as chatbots, generative AI (eg ChatGPT), writing assistants (eg Jasper AI) and unique image/art generators (eg Midjourney) (collectively, AI tools) are playing rapidly increasing roles in and across a number of business operations - from automating and optimising processes, through creating unique content to predicting customer preferences/trends.
However, as businesses seek to reap the rewards of these significant innovations, we must also recognise, consider and address the significant issues that arise from the adoption of these emerging AI tools (larly in replacing human actions and consideration), including those issues associated with privacy risks that are directly proportionate to the volume and sensitivity of personal information that the AI tool processes.
Training, ongoing use and datasets
AI, AI tools or, for that matter, any other computer system seeking to develop sophisticated "intelligence" and ultimately provide utility to businesses (ie "think" like a human and mimic human behaviour) must be "fed" high, exponentially increasing and diverse volumes of complex information from wherever one can get it (ie datasets). In practice, this means that human programmers select an appropriate machine learning model, prepare and supply information/datasets to the program in order to allow the computer model to train itself and then operate to find patterns and make predictions, with varying levels of supervision and tweaking of the parameters done by the human programmers. This is a privacy risk both for people (i) making the enquiries (where personal information is used to refine the questions and is collected for re-use by the AI tool) and (ii) unlawfully collecting the personal information contained in the answer (ie the content generated), especially collecting and using any sensitive information without consent. In addition, some AI tools are also now training themselves to find and ingest additional information from new sources.
While programmers will use a variety of sources of information to create the datasets that ultimately train (and are ingested and used on an ongoing basis) in the relevant AI tool, a common, efficient and low-cost source of "endless" information is web scraping of (ie automated processes to extract data from) websites including, in some instances, from sources not publicly available. OpenAl, the company behind ChatGPT, was reported to have trained version GPT-3 with 570 gigabytes (totalling 300 billion words) worth of books, webtext, articles, blogposts and other writings while Midjourney's image generation database consists of over 5 billion indexed images (including bespoke artwork), all of which were systematically scraped from the internet and other accessible datasets.
Such has been the uproar about this that future regulations may require AI tools to disclose the sources of their datasets/information ingested by their products and likely, impose restrictions on certain categories of information that may be used for AI purposes without the relevant individual's consent and remind people of (and possibly strengthen) intellectual property (IP) laws as regards this use and businesses' privacy and IP obligations as regards their employee users of AI tools.
Naturally, the aggregation of immense datasets from disparate sources (including by way of web scraping) raises ethical questions as to whether the collated datasets used both to train and for ongoing use by AI tools amount to "theft", plagiarism, breach of IP laws such as copyright including moral rights (particularly as concerns artwork and literary works) as creators of the original works (ie owners of the scraped content) are often not compensated, credited or provided the opportunity to consent to their works' inclusion in these datasets. Use of parts of works in AI tools is often also in breach of the creator's moral rights of integrity and attribution. While such was argued by several artists in a class-action lawsuit filed in January 2023 against Midjourney and other AI art generator companies in the US District Court for the Northern District of California (which case is yet to be determined), this article focusses on the often-forgotten other risks of using AI tools for business, those that arise in relation to privacy and cybersecurity.
What are the privacy and cybersecurity risks of using Al tools?
The specific cybersecurity and privacy risks that apply to the creation and/or use of any given AI tool will vary according to the following:
cybersecurity measures in place
the datasets used both to train and for ongoing use by the AI tools
the technical controls in place and
whether the personal information is ever deleted or de-identified
Web scraping and other automated techniques that aggregate vast amounts of information from the internet and collate it into vast datasets for the purposes of (i) the training of the AI tool and (ii) for ongoing use by the AI tool pose inherent risks to privacy since the information/ datasets collected will likely include personal and/or sensitive information. This is so even where "best efforts" have been made (or are said to have been made) to de-identify or delete personal information, such as is "warranted" by Chat GPT and other premium services. As a result, AI tools that produce written content, answer specific queries or "blended images" may inadvertently make use of (and output) content that includes personal information (including user search histories) and IP-infringing works. These are risks both for people (i) making the enquiries (where personal information is used to ask the questions of the AI tool) and (ii) unlawfully (and sometimes unbeknownst to them) collecting personal information contained in the answers given or IP-infringing content generated by the AI tool, especially where such contains sensitive information without consent. These risks are heightened where businesses use their own in-house solutions for developing and deploying AI tools and collect and hold large volumes of data, (often containing personal information), often failing to ensure the security of their systems and the lawful collection of the personal information in the first place, putting them at significantly higher risk should any data breach occur.
Developing Al tools
Businesses that develop and/or use in-house AI tools must ensure that all information fed into those AI systems (especially personal information, any sensitive
information and any copyright material) is collected by lawful means in compliance with existing privacy law requirements (and, in the case of sensitive information, that the appropriate consents have been obtained) and/or, for copyrighted works, that appropriate licences are obtained.
In practice, this means any collection of personal information (ie information that has not been appropriately de-identified) for the purposes of the training of, and for ongoing use by AI tools must be notified to those individuals in accordance with Australian Privacy Principle (APP) 5 and consent must be obtained where the data in question includes sensitive information in accordance with APP 3.3. Employers will usually be liable for the actions of their employees in developing and deploying an AI tool and therefore must ensure that all information fed into or used with their AI tool and systems (and especially personal information) for development purposes is:
- collected by lawful means in compliance with existing privacy law requirements (such as appropriate notice being given, as per APP 5)
- obtained with consent for the collection of sensitive information and
- obtained with relevant licenses for use of copyrighted materials
Using Al tools
In addition, even if not developing but simply using AI tools, business should monitor staff usage of all AI tools and develop internal policies that outline requirements (what is and is not permitted) in respect of processing of relevant information and datasets (AI use policy) to ensure staff do not accidentally use or inadvertently either disclose personal information they should not be disclosing when interacting with that AI tool (eg such as by developing, training or otherwise maintaining that AI tool) or unlawfully collect personal information (or use copyrighted works without a licence). Failure to ensure staff act in compliance with the privacy and IP laws for business uses will result in the business being liable for any breaches of privacy and IP laws arising from such employee actions. In this regard, an AI use policy will both help reduce and manage this risk of misuse and set up the "rogue employee" defence if the employee in question has flagrantly disregarded the requirements of such a policy.
Privacy risks also arise by the inadvertent or careless disclosures made by the users (including staff) when interacting with AI tools. Likely trying to get the most out of the AI tool (the best answers/content) users may have in the back of their minds the old adage "garbage in, garbage out" and therefore provide an ever-increasing level of detail and specificity to refine their queries to get the most accurate answers. Irrespective of the purpose for which an AI tool is used, with each use is "created" further AI tool content (ie by inserting prompts onto these AI tools/services) and we are handing over to that AI tool for ongoing public use all information (including any personal or confidential information) we input to interrogate the AI tool. For example, a lawyer who asks or prompts an AI tool to review a draft commercial contract or other confidential or client propriety information will have provided that AI tool personal and confidential information, the result of which is that all such information provided to that AI tool will become public and will remain available for use by everyone that uses that AI tool. This is particularly problematic as not every (or currently many) AI tools/services are capable of reliably and appropriately de-identifying all personal information associated with questions/prompts put to it. That is, further use by others of that AI tool will likely disclose that confidential or personal information if they ask the right questions.
In this sense, the human error of users (including employees) poses a real risk to business in relation to confidentiality and privacy law. Also, in the broader IP law context, instances where an AI tool creates "original" content that draws (ie copies) from existing works that were likely scaped from the internet or provided to it by careless (or reckless) users will lead to IP infringement by the business. It is important at this stage to remember that content on the internet is not free of copyright (or contractual obligations) protections, nor does it lose its privacy protections.
This use also has the potential to create complicated flow-on issues where subsequent users of the AI tools will be provided with (and make use of) "original" and potentially copyright and privacy law-infringing content created by an AI tool. That means the user will unknowingly violate another party's rights (privacy rights as well as copyright and/or, for image/logo generating AI tools, trademarks and/or registered designs). Accordingly, training employees in the lawful (and businesses-permitted) use of AI tools and having an AI use policy will become a common and likely required business risk prevention practice as AI tools increasingly play a greater and greater role in business.
Creating "new" personal information
AI tools may, in "creating" content (ie answering questions they are asked or performing tasks they are set), also create or present information in ways that was previously unavailable and, as a result, create personal information which is then, likely, unlawfully collected by users (and their employees). The most pressing practical concern is that AI tools with access to voluminous and disparate datasets and cutting-edge powerful algorithms can re-identify previously de-identified or anonymised data and therefore create personal information (ie link the individual to the de-identified data to which it relates). This creates additional hurdles for businesses seeking to comply with both their collection and retention requirements under APPs 3 and 11.2. In practice, de-identified data, even in the public domain, can be re-identified if it is matched with other available data using AI analytics to make the links with just a surprisingly few reference or data points exist to match that data and identify an individual (or even a few alternate individuals) to which the information relates.
While ChatGPT and other AI tools state that they make efforts to operate using only de-identified data and implement content filters so that individuals cannot leverage the AI tools to access personal information (ie invade an individual's privacy), there are easily acces-sible methods to bypass such content filters (and these regularly circulate the internet) allowing users of the AI tools to ask questions and use the AI tools beyond the scope of the "rules" that limit that AI tools' functionality. That is, once an AI tool's/chatbot's content filters are removed or modified, a user can put certain facts to the AI tool and request it to narrow down who the relevant anonymous individual, for example, is to the most likely few options, thus re-identifying the information despite the "best efforts" of the companies offering the AI tools to prevent this.
Among other concerns, this possibility supports the argument that the only way to comply with APP 11.2 is to irrevocably delete all personal information. In other terms, the view that "there is no such thing as de-identified information" (ie it can always at some time in some circumstances be re-identified) may continue to gain traction, as it appears it has in the Attorney-General's Privacy Act Review Reports (AG Report), and lead to legislative change.
The recent exponential growth in the use of AI tools in business raises the noted privacy risks which need to be considered, discussed and addressed by business along with the potential cybersecurity exposure caused by AI tools. Advanced AI tools involving chatbot/ writing services will improve the efficiency of cyber-criminals to generate and mass produce sophisticated phishing and business email compromise emails, particularly as such chatbots are excellent "impersonators" (ie able to adopt personas on command). This, in turn, escalates the measures (technical, policies and training) that businesses need to put in place to meet their APP 11.1 obligations and generally, secure their information.
How should businesses use Al tools in a privacy-compliant manner?
Businesses that leverage AI tools or solutions must implement privacy- and security-by-design principles to ensure that privacy and cybersecurity considerations are proactively contemplated and addressed from the beginning, during the initial design of the AI tool or proposed use of an AI tool and throughout its deployment, training, development and continued use. That is, privacy and security must be built into the design specification, physical infrastructure and business practices relating to the AI tool. In practice, this requires businesses to:
- implement robust cybersecurity measures across all systems relating to the AI tools, including access controls, encryption of data (including at rest) and use of two-factor authentication
- undertake a robust privacy impact assessment (PIA) to identify the impact that the use of the Al tool may have on the privacy of individuals (both of users and others) and methods to eliminate those impacts both prior to allowing the AI tool to be used and periodically after its rollout and
- commit to best practice privacy compliance and cybersecurity from the outset of developing, deploying and using the AI tool, including providing appropriate notices and obtaining any necessary consents in respect of any personal/sensitive information used to train and for use by the AI tool and complying with deletion requirements as per legal requirements and any internal company data retention and destruction guidelines
Multi-jurisdiction businesses should also note that a number of international jurisdictions restrict (or impose conditions on) the use of AI tools/solutions. In particular, Art 22 of the EU/UK General Data Protection Regulation (GDPR) restricts the use of AI for automated decision-making (including profiling) that significantly impacts a data subject, unless such process is:
- provably necessary for the performance of a contract between the data subject and the business (ie where its impractical to deal with large volumes of data)
- authorised by other EU/UK laws or
- explicit consent is obtained
Accordingly, businesses subject to EU/UK law will have heightened obligations in instances where AI tools are leveraged for or as part of an automated decision-making purposes.
Further, ensuring an adequate workplace AI use policy and that employees are adequately trained on it and the use of AI tools, cybersecurity and privacy risks generally are also crucial components to ensuring that the use of AI tools remains compliant with privacy (and other, including IP) law. Businesses should only allow employees (including contractors) to process personal information used by/fed into the AI tool or interact with AI tools where those employees are trained to know what personal information can/cannot be disclosed and collected, and to identify and report privacy breaches (such as where the AI tool is engaging in unlawful collections or monitoring of individuals) and to not inadvertently generate or re-identify personal information or unlawfully collect or disclose such.
The AG's Privacy Act Review Report proposed changes
With the publication of the AG Report in February 2023, Australia's future privacy compliance landscape including in respect of AI, is expected to undergo rapid evolution as a result of 116 key proposals for change.
Chief among the future anticipated privacy reform are (in response to recent significant data breaches/cyber incidents) the amendments to the data minimisation requirements under APP 11.2 including:
- a requirement that businesses establish maximum/ minimum retention periods in relation to the personal information held and a requirement to periodically review such
- enhancing the Office of the Australian Information Commissioner's guidance to clearly define what "reasonable steps" businesses should undertake to destroy and/or de-identify personal information by industry
- imposing ongoing obligations on de-identified obligations and
- leaving open the potential for further legal reform with the AG Report suggesting continued examination of legislative retention periods
Alongside these proposals, the AG Report further proposes to amend APP 11.1 to include more details in respect of required technical and organisational measures to ensure the security of personal information, enhance (and more precisely, define) consent requirements, remove the small business exemption (meaning the businesses with a turnover less than $3 million may be caught by the Privacy Act 1988 (Cth), including as concerns AI usage) and generally bring Australia's privacy landscape more into line with the more rigorous practices as seen under the EU/UK GDPR.
Accordingly, businesses should expect to see the baseline privacy and cybersecurity requirements in respect of AI tools only increase (significantly) if these proposals are enacted into law.
While implementing and using AI tools will become increasingly necessary to maintain their competitive advantage, businesses that wish to use AI tools/solutions risk falling foul of current (and likely future) Australian and international privacy and IP law requirements, including as to notification, consent and data minimisation requirements. In addition, businesses risk exposing themselves to increasing the potential scale of and the negative outcomes arising from any "eligible data breaches". The surest future proofing that businesses can adopt today to accommodate the use of AI tools into the future is to ensure any AI tools are used in adherence with an AI use policy which includes (or references) a privacy-and security-by-design framework and training all employees on the policy and in privacy and IP-compliant use of AI tools.
With the penalties under the Privacy Act recently increased to up to the greater of $50 million, three times the value of any benefit obtained from the misuse of information and 30% a company's turnover in a relevant 12-month period, as well as the increasing potential compliance requirements emerging from the AG Report, organisations must increasingly adopt best practice privacy and cybersecurity practices, especially as concerns the rapidly growing use and reliance on AI tools by businesses.
Footnote: Attorney-General's Department Privacy Act Review Report (2022) www.ag.gov.au/sitesidefault/files/2023-02/priva.cy-act-review-report_O.pdf.
*Published in LexisNexis Privacy Law Bulletin 2023 Vol 20 No 5