We should diversify AI safety research

Brainwashing humans is bad.
Is brainwashing AGI the only way to be safe?

I agree powerful AGI poses risks to humanity. We should be working on solving this. But currently all the effort is in interpretability and alignment. We should diversify more

Alignment research might make the problem worse

The closest notion to “aligning” AGI/ASI in currently known intelligent species such as humans and other animals is brainwashing / propaganda / manipulation.

In other words, if we do figure out how to “align” an AI, your trust chain now falls onto the aligner. Ladies and gentlemen, how much do you trust the aligner? If their idea of “good” harms you, what can you do?

Let’s take your best case scenario - you are somehow the most “good” person in the world (no such thing btw). You have also suceeded in brainwashing AGI to behave according to your definition of “good”. Are you THAT confident of your research not being leaked into the wrong hands and brainwashing another AGI to behave badly? Or of someone on your team sabotaging your AGI?

Even if safety research itself is not dangerous, it is making us complacent. Too much money and attention being poured here that could be better directed elsewhere. This also leads to the same conclusion - we should be finding other ways to solve this problem.

So what’s the solution?

8 billion general intelligence machines walk the planet today. Good behavior is rewarded while bad behavior is penalised with societal, economic and punitive incentives. This system works well most of the time. Let’s pour money into making this system more robust.

We should also pour money into thinking about alternate solutions. One example is research that makes humans much much smarter. Other examples include:

TODO: add a list here
I don’t have more ideas rn so yes this is a cop-out. But I will get back

There is one idea I have seen floating around a lot - “Ban AI research itself”. I think this is even worse.

Because pandora’s box is already open

If AGI/ASI is possible, then it seems inevitable that we will get there - there are hardly any secrets in AI research today. If so, then banning AI research is the worst possible action. It will ensure rogue actors get there first.

Best case scenario is if it turns out that AGI requires trillions of dollars to get there which is too high for rogue actors. But how much are you willing to bet that this is the case?

Humans are really good at inventing new X-risks. Nukes, drones and bioweapons already exist. Research in almost any domain will lead to more such inventions (Eg: fusion research -> relativistic weapons). But this same tech progress is what makes everyone’s lives better. AI is a double boost in that sense. We can accellerate all research fields at the same time. Are you willing to give up potential cancer cures in the fear of bioweapons?

In fact, every single human (nature’s equivalent of AGI) is already capable is doing great damage. We can’t forever live in fear of our own capabilities as a species. This is a barrier that we will have to cross someday or the other.

Rebuttals

playing devil’s advocate, why do you think nonproliferation of AI research wont work? it did work with nuclear weapons

Nuclear non-proliferation didn’t really work. India, Pak, South Africa, North Korea and Israel were successful. Apart from that, there is also the existence of Nuclear threshold states like Japan
If one of the 2 giants (US and China) succeed in building AGI, I don’t see how it is possible to stop the other one. Maybe if AGI allows one country to become way more powerful than the other almost immediately, but I find this very hard to believe this extremely hard takeoff scenario
If both the giants succeed, then maybe there is a case for them stopping the rest of the world, but even then it requires an implicit assumption that AI research will always require extreme amounts of compute. We don’t have concrete evidence that the final AGI algorithm can’t be tiny or easy to run on cheap hardware. (This might be a controversial opinion lol)
Lastly there is hardly any AI secrets in the world today. I think there will be a “4-min mile” like effect where once someone shows its possible, everyone will follow really fast

P.S.
Sometimes all of this seems silly when we can barely agree on what intelligence means. I’m still not sure we won’t hit a wall. But then again, human brains are an existence proof.

P.P.S.
All the discourse today sounds like the physics discourse right before the UV .catastrophe

Mithil Vakde

Alignment research might make the problem worse

So what’s the solution?

Because pandora’s box is already open

Rebuttals