AI safety, agents ability for long-term tasks, AGI is getting closer
We'll be hosting Lawrence Chan from METR.org to discuss a few interesting research works that were released recently.
The first work was conducted with OpenAI and Oxford University researchers on AGI alignment issues.
The authors argue that AGI could have its own objectives, could learn to pursue goals that are in conflict, could learn to act deceptively to receive higher reward, learn misaligned internally-represented goals that generalize beyond its fine-tuning distributions, and pursue those goals using power-seeking strategies.
In this work, the researchers expand their review of emerging evidence for these properties. Here's the paper.
Also, there is another very interesting work that recently came out from METR.org – AI agents' capability of performing long-term tasks is exponentially growing, doubling every seven months.
This work shows how inevitably closer we are getting to AGI. Read the paper here.
In this ask me anything session, we will cover these research works and broadly discuss AI safety and alignment issues.