Zhang Linghan: Strengthening Algorithm Governance and Regulatory Standards for AI Trust

Introduction

As the reform of the market-oriented allocation of data elements continues, the construction of systems, value release, and governance coordination become key issues for promoting high-quality development in the digital economy. In this context, experts, scholars, local government officials, and business representatives discuss new paths for high-quality data development.

Reducing Data Bias in AI Training

Question: With the available stock of training data nearing its limits, what trends will emerge in the sources and construction methods of future large model training data?

Zhang Linghan: Data is the core foundational element for training AI models and is essential for the differentiated competition and continuous advancement of large models. The quality of the training data corpus directly determines the capabilities of AI large models and affects the compliance and fairness of AI outputs. Specifically, future corpus construction should focus on three dimensions:

Clearly define the legitimacy of online data sources, excluding unauthorized personal information, infringing content, and non-compliant data from the training corpus to prevent low-quality and harmful data from entering the training process.
Coordinate copyright rules to clarify the reasonable boundaries for offline data use, balancing data utilization and copyright protection to avoid data supply issues due to copyright disputes.
Promote the establishment of cross-domain data circulation and transaction rules, improve data supply incentive mechanisms, and encourage legal and compliant data sharing and transactions to provide institutional support for high-quality corpus construction. Compared to market data, data held by public service institutions such as government departments and research institutes has inherent authority, accuracy, and broad coverage, which can enrich the dimensions of training data and effectively reduce data bias in model training, enhancing the fairness and reliability of AI outputs.

Adapting Regulatory Models to AI Technology

Question: In the face of rapidly evolving AI technology, how should we optimize governance and regulation of AI and algorithms?

Zhang Linghan: Current governance of AI and algorithms can no longer rely solely on post-event remedies; the focus should shift more towards prevention and in-process control, adapting regulatory models to the iteration of AI technology. A more comprehensive preemptive governance system should be established, improving core institutional tools such as filing, labeling, evaluation, safe harbor, and regulatory sandboxes. We must strengthen in-process control of AI and algorithms, achieving transparent and standardized regulation. Based on the principle of information disclosure, we should enhance algorithm transparency, requiring companies to disclose data sources, decision processes, and algorithm logic for AI algorithms that involve public interests and personal rights. Additionally, based on the principle of public participation, we should conduct algorithm impact assessments focusing on potential risks such as algorithmic bias, data misuse, and rights violations, inviting public, expert, and regulatory participation to timely identify and correct errors and biases in algorithms. Furthermore, based on the principle of providing reasons, we should implement algorithmic explanation rights, ensuring that when AI makes decisions affecting user rights, users are clearly informed of the decision basis, process, and rationale, safeguarding their right to know and supervise.

Establishing Standards for Reasonable Trust

Question: In the deep application of AI, how can we prevent damage caused by AI hallucinations? If such damage occurs, how should we delineate responsibility?

Zhang Linghan: If erroneous content generated by AI hallucinations is trusted by users, it may lead to rights violations. Service providers must inform users of risks and guide them towards rational trust, reducing the risk of damage from hallucinations at the source. We require AI service providers to prominently display warnings such as “This content is AI-generated and for reference only” to guide users in viewing AI outputs rationally and minimizing the risks of blind trust. We must clarify the applicable standards for reasonable trust in highly capable AI. When AI systems approach or exceed the cognitive abilities of ordinary users, the standards for users’ reasonable trust in AI-generated content can vary significantly. Therefore, in institutional design, we need to define the conditions and standards for users to reasonably trust generated content and develop differentiated standards based on different scenarios. We should confirm the duties of care and responsibility distribution among model providers, system deployers, and tool providers. Generative AI systems often consist of multiple parties, including models, platforms, and tools, which differ significantly in technical control capabilities, risk foreseeability, and actual intervention levels. The strength of the duty of care should be judged based on the generality of the model, the risk level of the application scenario, and the specific design and deployment of the product.