China’s National Cybersecurity Standardisation Technical Committee's set of technical standards which were issued with immediate effect in March (Technical Standards), were followed by a consultation draft of proposed national standards in late May (May Draft). Both standards have attracted market attention given the enthusiastic take-up of AI in the world’s second largest economy, as elsewhere.
Why are these standards of significance?
These two standards provide guidance on how to fulfil various security requirements when deploying GenAI products and services within the framework introduced by China’s seminal first GenAI regulation, which came into effect in August last year.
The Technical Standards focus on the “Basic Security Requirements for Generative Artificial Intelligence Services”, including training data and model security, data protection and minors’ protection, and security assessment. The May Draft – once finalised – will convert largely the same requirements into national standards having higher authoritative weight.
Businesses offering AI services, or investing in the AI industry, in the PRC market should keep abreast of these regulatory developments given the downside compliance risks that come with the huge opportunities presented by this emerging technology.
What are basic security requirements?
We summarise below some of the key requirements under the Technical Standards and the May Draft relevant to GenAI service providers.
Requirement | |
---|---|
Content filtering | Under the PRC rules on content governance the term “illegal and unhealthy information” consists of 11 types of illegal information and nine types of unhealthy information. GenAI service providers are required to filter out illegal and unhealthy information, refrain from IP infringement, and ensure data protection by, for example, using key words, model classification and manual inspection of data samples. |
Positive content | To promote the generation of positive content, GenAI services providers should conduct security testing on content inputted by users, and improve their models by means of fine tuning. |
Annotation | GenAI service providers must reserve sufficient and reasonable time for personnel to annotate data sets. The annotation rules observed by these staff should be set for each of the 31 types of major security risks. The risk types include content that violates Chinese core socialist values, constitutes illegal business practice, is discriminative, infringes others’ legal rights, or is otherwise unable to meet the security requirements of specific service types. |
Model filing | Where the GenAI services are offered based on a third-party foundation model, the Standards suggest that service provider should only use the model that has been filed with the competent PRC regulator. This echoes the filing requirements under the GenAI regulation. The May Draft, on the other hand, proposes to remove this requirement. While there is no official clarification about the rationale for this proposed amendment, some market commentary has been quick to suggest that it indicates a shift in the PRC government’s attitude towards further support of AI development. |
Protection of minors | GenAI service providers are required to:
|
Opt-out mechanism | Users must be provided with an opt-out before their personal information will be used for GenAI training purposes. The opt-out function should be easily accessible within four clicks, and users should be clearly informed of their right to refuse such use of their data. |
Transparency | GenAI services providers should display in a prominent manner information relating to:
|
Training data security: the 5% rule
The original draft of Technical Standards proposed criteria that indicate when a source of training data should be excluded from training generative AI – a so-called “corpus blacklist” (i.e. banned collections of structured text whose large size may otherwise lend themselves to machine learning).
While removed from the final version of the Technical Standards, both that version and the May Draft still outline what sources of corpus are banned and expect GenAI service providers to conduct security assessments to identify these sources in their training data.
If a single source of training data contains more than 5% of illegal and “unhealthy” information, GenAI service providers must not use training data originated from such source.
Sources of training data should also be traceable through clear evidence of authorisation of use or licensing records, such as an open-source licence agreement or other commercial agreement. GenAI service providers will therefore need to maintain records of data sources (for audit and potentially investor due diligence purposes).
Security assessment
Similar to the requirement introduced by China’s GenAI regulations, the Technical Standards suggest each GenAI service provider should conduct a security assessment of broad, undefined scope by itself or by engaging a third-party assessment agency, before the launch its GenAI services or when material changes occur. However, the current form of the May Draft does not incorporate this requirement.
That said, this compliance burden is not eliminated entirely as both the Technical Standards and the May Draft require service providers to assess training data safety, generative content safety, and their bank of rejected prompts.
For example, manual sampling should be used to randomly sample at least 4,000 training data from all training data and the pass rate should not be lower than 96%; when using keywords, classification models and other technical methods to conduct sampling, at least 10% of the total training data should be randomly sampled, and the pass rate should be at least 98%.
Looking ahead
A comprehensive AI Law is on its way according to the State Council’s 2024 legislation plan, and more guidance covering security requirements for manual annotation, pre-training and fine-tuning of data used in GenAI are also under formulation.
Compliance requirements are also being set by China’s courts seeking to resolve early AIGC-related disputes challenging the copyright of AIGC work, IP protection for AI-generated voices, as well as dealing with AI output copyright infringement.
However, over the last couple of months, China’s cyberspace regulator has released official AIGC-specific filing information on 117 AIGC services and several batches of algorithms filings. This shows that while the compliance landscape in China is seen by some commentators as more burdensome than other markets, the huge potential of GenAI services means that progressive businesses are still seeking to deploy GenAI tools here.
The Standards, together with the May Draft (once finalised), despite not having binding effect, serve as a relatively comprehensive playbook for those progressive businesses on how to comply with the otherwise vague obligations introduced under August’s GenAI regulations. Investors and other companies looking to use GenAI in China should analyse these standards to really benefit from the prospective advances in technology.