8BitBrain

Did another test stream with GPT4

Last night I did another stream of the bot with GPT4. It was only up for an hour and cost about $6, so this would be really expensive to keep running a lot. It was fun though - a friend and I gave it a lot of prompts in the chat to do things like talk like a pirate, bark like a dog, type weird things into the game, etc. It got lost for a while and had trouble getting back on track without a lot of suggestions, but honestly I don't think it did that much worse than a human might have. And it handled all our weird suggestions pretty well.

I also made a robot avatar for it and put its chat up on the screen. It would be cool to have it animated, you could do neat things like change the face based on its reported mood, or whether it was talking or not.

GPT-4 is a LOT better at the game than GPT-3

I got GPT-4 API access yesterday and wow, it really is impressive how much better it is at playing Colossal Cave. I wasn't expecting much improvement, but it's night and day. I haven't seen the model make any of the errors I listed in my last post. Also, it just feels more human-like in the way that it tries things and corrects them later. It's also able to execute multi-step plans based on feedback.

For instance, if it misses something in one room, and then goes to another room, you can give it a hint through chat to say "I think you missed a direction back there". And it will actually remember how to get back to where it was, try the only direction it hadn't tried before, and then say thank you for the hint.

Another exchange I had was after it tried to pick up an axe and was told its inventory was full. It dropped the food to take the axe. So then I asked it, "Why did you drop it instead of eating it?" and it said, "Now we've got the axe in our hands. Greeblefan, I dropped it instead of eating it because I thought it might not be necessary right now and we had to make room for the axe. We can always come back to pick up items later if needed. Let's venture onward!"

I'm pretty impressed, it's hard not to anthropomorphize it when it's giving me responses that indicate actual reasoning...

AI Twitch Streamer attempts to play Colossal Cave Adventure

I thought it would be fun to have a fully AI twitch streamer that could interact with chat while playing a game. Since Colossal Cave Adventure (the classic early text adventure), is simple to parse on the console, I hooked it up over the last week, including a name that ChatGPT suggested, 8BitBrain.

The pilot stream was today for 4-5 hours. It didn't get much attention, but it held it together and the AI actually made it into the cave for the first time (with a lot of help from me in the chat). I'm not keeping it online because it costs me money to run, but the VOD should be up for a couple months.

https://twitch.tv/8bitbrainai

Basically, there's a python backend that uses the Twitch API to get chat messages, and runs a process of colossal-cave-adventure. It has a loop that reads from the game, sends the chat and game output to ChatGPT, and ChatGPT returns a JSON object with what it wants to say to chat and what it wants to input to the game. The speech is synthesized using Azure TTS (ElevenLabs was better in my experiments, but way more expensive).

I also added a couple fun features, like ChatGPT can say what it's "mood" is, and the voice synthesis can change based on that. It can also list users it wants to time out, if it decides they are not behaving, but I didn't get around to hooking that up so it would actually ban people on Twitch.

In order to guide ChatGPT away from certain errors it made too often, I implemented a "system message" from the Monitoring Bureau that would give it feedback. It would tell it to make a move in the game if it hadn't for a while, or to introduce itself occasionally for new viewers. Unfortunately, I couldn't get it to consider that system message as private, so it frequently says things like "thanks for the reminder, Monitoring Bureau!"

It also likes to simply list every person's name who says anything in chat, so there would be long lists of recited usernames. I eventually just parsed those out and deleted them, since it was annoying.

The frontend is a Unity app which runs a webserver and just displays the output of the python backend, plus the current score and inventory if the AI manages to pick anything up.

This was pretty interesting to set up! My main takeaways are:

1) ChatGPT (3.5-turbo) is very bad at the game, and it makes the same mistakes over and over, even when it has an example of the same mistake (and correction) in its recent context. The illusion of this type of AI feeling "human" fell apart pretty quickly for me. It made the following types of errors, among others:

Typed entire sentences into the game over and over, instead of simple 2-word commands
Went east, then west, then east, then west, in a loop
Went east and west in the same command
Typed what it wanted to speak to chat into the game's input
Constantly nagged chat for being off-topic, when they were just talking amongst themselves
Refused to type anything into the game for long periods without being explicitly told to
Spoke long descriptions of where it currently was in the game, did nothing, and then said the same thing slightly rephrased
Completely made things up, i.e. "I think I'll check my inventory. Oh, I see that I have a lamp there!"
Never learned that "look around" just doesn't work, despite trying over and over

2) At the same time, being able to give instructions to something in natural language and get arbitrary working JSON back is awesome, and really opens up a lot of possibilities for quick-and-dirty solutions (albeit via an expensive API). Even if you get malformed JSON back, you can literally just say "this didn't parse, fix it," and 9 times out of 10 it does. It's also impressive that it can do two things at once - talk to chat and play the game (usually - sometimes it decides to write its chat responses into the game). Honestly, being able to do this at all, in a week, is something that just wasn't really possible for me a year ago.

3) I think a big issue for chatbots is that ChatGPT seems to learn from its own bad behavior. It's easy to get long streams of nearly identical responses from it, since if you feed its own responses back to it in the chatlog, it seems to use them as examples of how to act in the future. But if you omit its own responses, it doesn't have any context for what it has already tried. Maybe someone else has found away to get around that?

I'm not sure how much further I could take this, but hopefully it's interesting to someone. I don't think it has much of a chance of beating the game without a lot of handholding from chat. It's maybe one step up from Twitch plays Pokemon, honestly. When I get access to GPT-4 I'll run it again and see if it improves at the game.

Log in to follow projects

8BitBrain

Did another test stream with GPT4

GPT-4 is a LOT better at the game than GPT-3

AI Twitch Streamer attempts to play Colossal Cave Adventure