o1 is different, and even sceptics are calling it a ‘large reasoning model’. But why is it so different, and why does that say about the future? When models are rewarded for correctness of answers, not just harmlessness or predicting the next word. But does even o1 lack spatial reasoning? How did the White House react yesterday? And did Ilya Sutskever warn of o1 getting … ‘creative’?
AI Insiders – Now $9: https://www.patreon.com/AIExplained
Chapters:
00:00 – Intro
01:04 – How o1 Works (The 3rd Paradigm)
03:10 – We Don’t Need Human Examples (OpenAI)
03:54 – How o1 Works (Temp 1 Graded)
06:28 – Is This Reasoning?
08:48 – Personal Announcement
11:27 – Hidden, serial Thoughts?
13:11 – Memorized Reasoning?
15:40 – 10 Facts
o1, Learning to Reason – https://openai.com/index/learning-to-reason-with-llms/
Species Tweet: https://x.com/nickcammarata/status/1835763847430631692
Noam Brown Video: https://www.youtube.com/watch?v=eaAonE58sLU
https://x.com/polynoamial/status/1834280155730043108
2021 Paper on Verifiers: https://arxiv.org/pdf/2110.14168
Let’s Verify Step By Step: https://arxiv.org/pdf/2305.20050
DeepMind Not Far Behind: https://arxiv.org/pdf/2211.14275
Chain of Thought for Serial Problems: https://arxiv.org/pdf/2402.12875
Q* Clues (yes, I am proud of that one): https://www.youtube.com/watch?v=ARf0WyFau0A&t=5s
Let’s Think Pixel by Pixel: https://x.com/geoffreyirving/status/1835049328349467099
Or Dot by Dot (the general power of CoT): https://arxiv.org/pdf/2404.15758
RL by Karpathy: https://x.com/karpathy/status/1835561952258723930
Not Prompt Engineering: https://x.com/sytelus/status/1835433363882270922
ARC-AGI Analysis: https://arcprize.org/blog/openai-o1-results-arc-prize
Reality Foundation models: https://x.com/armandjoulin/status/1833432576813330681
Memorising Reasoning vs CoT Q-Table: https://x.com/rao2z
When You Know the Right CoT: https://x.com/lukaszkaiser/status/1835806956789146049
Original Information Report: https://www.theinformation.com/articles/openai-made-an-ai-breakthrough-before-altman-firing-stoking-excitement-and-concern?rc=sy0ihq
StockFish: https://en.wikipedia.org/wiki/Stockfish_(chess)
Will AGI fall like chess: https://x.com/polynoamial/status/1835151179476812053
Fei Fei Start-up $1B: https://www.ft.com/content/0b210299-4659-4055-8d81-5a493e85432f
White House Report: https://www.whitehouse.gov/briefing-room/statements-releases/2024/09/12/readout-of-white-house-roundtable-on-u-s-leadership-in-ai-infrastructure/
Simple-Bench: https://simple-bench.com/try-yourself.html
o1 Fails: https://www.reddit.com/r/ChatGPT/comments/1ff9w7y/new_o1_still_fails_miserably_at_trivial_questions/
My New Coursera Course! The 8 Most Controversial Terms in AI: https://imp.i384100.net/m57g3M
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
GenAI Hourly Consulting: https://www.theinsiders.ai/
I use Descript to edit my videos: https://get.descript.com/ldgxfuj2bhnb
Many people expense AI Insiders for work. Feel free to use this template: https://docs.google.com/document/d/1l_0RmVE4SYiAkY63LVvkcnyMwj4v7ZhGOKq77JYgrU8/edit#heading=h.nuqqxloqqhim
AI Insiders – Now $9: https://www.patreon.com/AIExplained