o1. Strawberry. Q-Star. We finally get answers to what the next generation of LLM reasoning abilities will bring us. And it’s good. Better than I thought. Simple-bench partial results, plus an analysis of the full OpenAI system card, press release, new capabilities, benchmarks and far, far more. It is a step-change, and no, that’s not just hype. Welcome to the o1 era.
AI Insiders: https://www.patreon.com/AIExplained
ChatGPT Status: https://x.com/ChatGPTapp/status/1775684027774836873
Mini, o1: https://x.com/willdepue/status/1834294935497179633/photo/2
System Crad: https://cdn.openai.com/o1-system-card.pdf
o1 Intro: https://openai.com/index/introducing-openai-o1-preview/
o1 Learning to reason: https://openai.com/index/learning-to-reason-with-llms/
Unfaithful ‘Thoughts’: https://www-cdn.anthropic.com/827afa7dd36e4afbb1a49c735bfbb2c69749756e/measuring-faithfulness-in-chain-of-thought-reasoning.pdf
Noam Brown Tweet: https://x.com/polynoamial/status/1834280305076961732/photo/1
Human-level Reasoning? https://x.com/max_a_schwarzer/status/1834278973154754711
Can Do Logic Puzzles Now? https://www.youtube.com/watch?v=3BkQI3nIiB8
Brockman Sabbatical Tweet: https://x.com/gdb/status/1834295775674990676
Still Falls Victim to 9.11: https://x.com/SuvanshSanjeev/status/1834290199276179858
Previous Apollo Research on ‘Deception’: https://arxiv.org/pdf/2311.07590
Simple Bench: https://simple-bench.com/index.html
AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
Chapters:
00:00-26:55: o1 ChatGPT – OpenAI
My New Coursera Course! The 8 Most Controversial Terms in AI: https://imp.i384100.net/m57g3M
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Many people expense AI Insiders for work. Feel free to use this template: https://docs.google.com/document/d/1l_0RmVE4SYiAkY63LVvkcnyMwj4v7ZhGOKq77JYgrU8/edit#heading=h.nuqqxloqqhim
AI Insiders: https://www.patreon.com/AIExplained