Stuart Russell’s Post

View profile for Stuart Russell, graphic

Professor of Computer Science at UC Berkeley

A new white paper proposing that we should build AI systems that we can understand, predict, control, and make high-confidence statements about: https://lnkd.in/eNRS6ETc

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

arxiv.org

This is me being the Devil... 1. What prevents a current AD/AV company from taking one look at that and saying "Look! We are _currently_ doing that with our systems, and thus we are guaranteeing safety!" ...because I'm sure they can come up with a narrative that fits the paper, especially without the paper being actually extremely specific. 2. Exactly which "world model" includes a pedestrian being dragged by a robotaxi after being knocked to the ground? How about something else entirely? How about one that has people trying to put traffic cones on top of its hood? Again, there is the talk of world models, but exact to what extent is it one? 3. Exactly what are the edge cases? What determines "extreme values?" What exactly keeps the "edge" simply being "what we've never bothered to program into the models because we've never expected anything like that"?

Mark Montgomery

Founder & CEO of KYield. Pioneer in Artificial Intelligence, Data Physics and Knowledge Engineering.

2mo

Three words used consistently and accurately in this paper -- framework, towards, and conceptual. Notably missing is executable, which we can do today with data management. The problem is the definition of AI in this paper is actually LLMs. Although this is a good step 'towards', it's early and still theoretical, during which time experimental systems with extremely weak safety engineering cited in OpenAI have already been scaled to a naive public, few of which are in a position to make a judgement on safety, and government representatives are failing to protect them. I'm skeptical LLMs can achieve safety-critical equivalents found in other technologies containing catastrophic risk. The physics involved are completely different. It will be a game of cat and mouse with LLMs for the foreseeable future.

Garrett Galloway, D.Sc.

SecEng, Generative AI Security, Red Team, Mentor, Educator.

2mo

I'm not a fan of the title of this. There is no guaranteed safe. I understand the levels of Safety Specification and Audit as defined in the paper. There's no tractable way to achieve anything beyond level 3 or possibly 4 in safety and we'll reach a validation level of 5. We can't test all possible anything in our analog world. Trying to do so is a fool's errand. We currently rely on people for many of the things we need a guarantee of safety on. People cannot be forced to function in any guaranteed manner. We don't have a sole source of truth on what guaranteed safe means. Also, who audits the auditor and what is the auditor's guarantee of safety in auditing? Where does the circular logic end? When you start mucking about with defining safety in interpreted language, you inherently destroy repeatability and reliability - you remove the guarantee. If something is complex enough to interpret free flowing human language and how we might define safety, it's not going to be reliable. Humans are not reliable. We created the language. We don't all universally agree what any of it means. What would change if the AI created it? People can't even secure the systems that host AI. Maybe retitle it Towards AI Safety Engineering.

Jim Welsh

Esoteric Writing and insight to Awareness adaptation to Sustainability through the Human condition and its implications to Business and Holistic integration. Anthesis Designs.

2mo

The road to hell is paved With good intentions AI - the Wild West ….. No regulations , hubris of egoic bad actors of first advantage chasing money 💰 through the false promises of Singularity myth …. Now needing Trillions to free mankind ! What a king size delusion… Another neoliberal Ponzi scheme to milk the public purse and steal money from unsuspecting investors of greed. No regulations No guard rails And Now ( the industry) wants self regulations ? China 🇨🇳 is already way ahead in quantum computing. Cheaper and far less energy than the ‘proposed’ Nvidia Chips and GpU clusters which will require Massive Energy For what outcomes ? For who ? The self appointed ‘gurus’ who are trying to make a God from sand ? The jokes on us If we believe this tripe…. There are good and real uses of AI which deliver benefits But Monopolies are monopolies… Only metric is MONEY 💰 and it’s power of greed and control ? Time to wake up people

Like
Reply
D. D. Sharma

Explore AI (Healthcare, Safety, Risk). Board Advisor (UC Merced).

2mo

Stuart Russell Max Tegmark Srini Narayanan At a quick blush the proposal maps the AI Safety problem to three problems. This framework of a world model (simulation), safety specification, verifier has been proposed in other safety related domains (telecom, aviation, nuclear power, etc.) and has not scaled well. Even if we can scope it well, how will the framework address intentional malignant misuse and the unknowns of N+1st scenarios that fall outside the safety and world model speculations.

Sandeep Ozarde

Founder Director at Leaf Design; PhD Student at University of Hertfordshire

2mo

Stuart Russell Great paper indeed. I have a quick question from an academic standpoint: Do you think that Human-Centered AI (HCAI) Design is crucial for achieving safety specifications, developing accurate world models, and implementing effective verifiers? By involving users and stakeholders in the design process, AI systems can be tailored to meet their specific safety requirements. This includes considering potential risks, ethical considerations, and legal compliance. HCAI Design also focuses on understanding the "context" in which AI systems operate, incorporating human insights and expertise to create more accurate and comprehensive world models. The design approach ensures that the verifier is user-friendly and effective, taking into account the cognitive abilities and limitations of users and providing clear and intuitive interfaces. For the AI system to be considered safe, a verifier—an automated tool or process—must check and confirm that the AI system can handle each of these potential behaviours within the specified safety requirements. The verifier must ensure that the AI system's responses and actions remain safe and compliant under all the different possible human behaviours outlined in the world model.

Like
Reply
Matthew Newman MAICD

Frontier Tech | Governance | AI Safety | Tech Strategy | Change & Impact | Founder TechInnocens

2mo

Great paper. I still hold that the task of providing the world model to ground AI is a task for humanity, if performed properly. i also argued the role of curating the data that forms this world model, with all the complexity of historic, current and factual, is probably one of the most important we could imagine. Here: https://www.linkedin.com/posts/transform_to-the-curators-go-the-spoils-the-rising-activity-7111816574231056384-84z6?utm_source=share&utm_medium=member_desktop

Gary Longsine

Fractional CTO. Collaborate • Deliver • Iterate. 📱

2mo

I mean, sure, we •should• and while we’re at it, we should tax fossil fuels too. 🧐🤔🍨

Sebastián Rimsky - Strategic Leadership and Management

Commercial & Corporation Law | Credit Porfolios | Data Analysis | Financial Markets | Negotiation | Public Management | Organizational Development | Trainer | Strategic Leadership & Management | Assistant Professor

4w

Dear all, I invite you to collaborate on my project about the application of artificial intelligence 🤖 in credit portfolio management 💼 in the public sector. Your participation is crucial to the success of this research 🌟. Below, I share a questionnaire 📋 that will be of great value to my project: https://lnkd.in/daEWfGQ4. I thank you in advance for your time and support 🙏, and I kindly ask you to share this questionnaire as widely as possible 🔄. Thank you very much for your cooperation 🙌. Best regards, Sebastián Rimsky

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics