"机器学习系统中的事故问题及安全探索研究"

人工智能

59 浏览量更新于2024-02-01 收藏 469KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

人工智能的快速进步和机器学习技术的发展给社会带来了巨大的影响，但同时也引发了人们对人工智能安全问题的关注。本文着重讨论了机器学习系统中的事故问题，即人工智能系统设计不当可能产生意外和有害行为的问题。作者从实际研究问题出发，提出了与事故风险相关的五个实际研究问题：避免副作用、避免奖励黑客攻击、无法频繁评估的目标函数、安全探索和分布转移。他们回顾了在这些领域的工作，并提出了与尖端人工智能系统相关的研究方向。最后，他们考虑了一个高层次的问题，即如何最有效地思考人工智能前瞻性应用的安全性。人工智能的发展给社会带来了巨大的变革和进步，同时也引发了人们对人工智能安全问题的关注。本文探讨了机器学习系统中可能出现的事故问题，即在设计不当的情况下，人工智能系统可能产生意外和有害行为。作者列举了五个与事故风险相关的实际研究问题，分别是避免副作用、避免奖励黑客攻击、无法频繁评估的目标函数、安全探索和分布转移。作者对这些问题进行了综述，并提出了与尖端人工智能系统相关的研究方向，强调了对人工智能安全性的重视。最后，作者从高层次的角度考虑了人工智能前瞻性应用的安全性思考，为未来的人工智能发展提出了建设性的指导意见。在人工智能快速进步和机器学习技术持续发展的今天，人们越来越关注人工智能技术对社会的潜在影响。本文中，我们特别关注了机器学习系统中可能出现的事故问题，即人工智能系统设计不当可能产生意外和有害行为的问题。为了更深入地探讨这一问题，我们列举了与事故风险相关的五个实际研究问题，分别是避免副作用、避免奖励黑客攻击、无法频繁评估的目标函数、安全探索和分布转移。通过对这些问题的回顾和综述，我们提出了与尖端人工智能系统相关的研究方向，强调了对人工智能安全性的重视。最后，我们从高层次的角度考虑了人工智能前瞻性应用的安全性思考，为未来的人工智能发展提出了建设性的指导意见。总的来说，本文通过对人工智能安全问题的深入研究和讨论，提出了机器学习系统中可能出现的事故问题，并列举了与事故风险相关的实际研究问题。同时，作者还提出了与尖端人工智能系统相关的研究方向，为未来的人工智能发展指明了方向。最后，作者还从高层次的角度探讨了人工智能前瞻性应用的安全性思考，为人工智能技术的应用提供了有益的建议和指导。可以说，本文对人工智能安全问题进行了详尽的剖析和探讨，是对人工智能领域有价值的研究成果。

资源详情

资源推荐

bring a bucket of water into a room full of sensitive electronics, even if it never intends to use

the water in that room.

There are several information-theoretic measures that attempt to capture an agent’s potential

for inﬂuence over its environment, which are often used as intrinsic rewards. Perhaps the best-

known such measure is empowerment [131], the maximum possible mutual information between

the agent’s potential future actions and its potential future state (or equivalently, the Shannon

capacity of the channel between the agent’s actions and the environment). Empowerment is

often maximized (rather than minimized) as a source of intrinsic reward. This can cause the

agent to exhibit interesting behavior in the absence of any external rewards, such as avoiding

walls or picking up keys [103]. Generally, empowerment-maximizing agents put themselves in

a position to have large inﬂuence over the environment. For example, an agent locked in a

small room that can’t get out would have low empowerment, while an agent with a key would

have higher empowerment since it can venture into and aﬀect the outside world within a few

timesteps. In the current context, the idea would be to penalize (minimize) empowerment as

a regularization term, in an attempt to reduce potential impact.

This idea as written would not quite work, because empowerment measures precision of control

over the environment more than total impact. If an agent can press or not press a button to

cut electrical power to a million houses, that only counts as one bit of empowerment (since

the action space has only one bit, its mutual information with the environment is at most one

bit), while obviously having a huge impact. Conversely, if there’s someone in the environment

scribbling down the agent’s actions, that counts as maximum empowerment even if the impact

is low. Furthermore, naively penalizing empowerment can also create perverse incentives, such

as destroying a vase in order to remove the option to break it in the future.

Despite these issues, the example of empowerment does show that simple measures (even purely

information-theoretic ones!) are capable of capturing very general notions of inﬂuence on the

environment. Exploring variants of empowerment penalization that more precisely capture the

notion of avoiding inﬂuence is a potential challenge for future research.

• Multi-Agent Approaches: Avoiding side eﬀects can be seen as a proxy for the thing we

really care about: avoiding negative externalities. If everyone likes a side eﬀect, there’s no

need to avoid it. What we’d really like to do is understand all the other agents (including

humans) and make sure our actions don’t harm their interests.

One approach to this is Cooperative Inverse Reinforcement Learning [66], where an agent and

a human work together to achieve the human’s goals. This concept can be applied to situations

where we want to make sure a human is not blocked by an agent from shutting the agent down

if it exhibits undesired behavior [67] (this “shutdown” issue is an interesting problem in its

own right, and is also studied in [113]). However we are still a long way away from practical

systems that can build a rich enough model to avoid undesired side eﬀects in a general sense.

Another idea might be a “reward autoencoder”,

which tries to encourage a kind of “goal

transparency” where an external observer can easily infer what the agent is trying to do.

In particular, the agent’s actions are interpreted as an encoding of its reward function, and

we might apply standard autoencoding techniques to ensure that this can decoded accurately.

Actions that have lots of side eﬀects might be more diﬃcult to decode uniquely to their original

goal, creating a kind of implicit regularization that penalizes side eﬀects.

• Reward Uncertainty: We want to avoid unanticipated side eﬀects because the environment

is already pretty good according to our preferences—a random change is more likely to be

very bad than very good. Rather than giving an agent a single reward function, it could be

Thanks to Greg Wayne for suggesting this idea.

剩余28页未读，继续阅读

阿杰技术

粉丝: 24
资源: 79

会员权益专享

"机器学习系统中的事故问题及安全探索研究"

AI与安全性

网络安全视角下的人工智能

人工智能安全发展问题研究.pdf

公众号:娜璋ai安全之家

人工智能在网络安全方面的发展前景

通过人工智能智能系统自动发现和漏洞扫描，识别网络安全漏洞，提高安全性，有什么具体的应用么？

何为人工智能？如何实现人工智能？人工智能有何用？人工智能如何用？结合这些问题，谈谈对人工智能的理解。七百字

超级人工智能何时会到来

具体探讨人工智能对人类未来发展的影响

举例人工智能九项优缺点并分别具体举例

人工智能在智慧校园中多个具体应用场景

写一篇基于人工智能技术的智能安全监测系统

举例人工智能十项缺点并具体举例

具体说说，探讨人工智能对人类未来发展的影响，阐述其可能带来的机遇和挑战。

列举人工智能在智慧校园中多个具体应用场景

列举人工智能在智慧校园中多个具体应用场景，例如学生管理、教学辅助、校园安全等。

如何理解复杂的网络空间安全工程问题

人工智能在中医药领域的应用具体哪些

如何利用ai人工智能实现智慧社区建设

具体说明小模型人工智能在轻量级应用的前景

会员权益专享

最新资源