Good article. The rise of "prompt influencers" is frustrating, since it makes it a bit trickier to find the signal of interesting AI content through the noise. I miss when the influencers were all just hawking altcoins. Guess I'll stick to watching Two Minute Papers[0] and reading papers that ak[1] tweets.
> Being “good at prompting” is a temporary state of affairs.
Valid point. Sam A said he thinks prompt engineering won't even be a thing in 5 years. And in the shorter term, prompts that work with a specific model like GPT-4 might not work well for future models, or even updates to the same model.
That said, I see prompt engineering as the beginning of a new paradigm of "intent engineering"--where developers use AI to understand and anticipate user intent with minimal user effort. It'll be fun to see what that looks like in 5 years.
>Sam A said he thinks prompt engineering won't even be a thing in 5 years.
I don’t understand what this could mean. I have to do “prompt engineering” when I talk to other people all the time - when I need to ask them to do tasks, when I need to clarify requirements, etc. As long as we’re communicating through text and not mind reading, some level of “engineering” will be required. AI is intelligent but it’s not magic.
Yes. The problem isn't the AI not understanding the human's intent. The problem is the human not understanding what the human wants/thinking it knows what it wants but being wrong.
Watching whole communities crowd-source/stumble their way into basic reference question principles is kind of fun, though.
> The problem is the human not understanding what the human wants/thinking it knows what it wants but being wrong.
This is really the essence of why most software development is hard. Aside from a few problems whose solutions can be mathematically or logically defined, programmers are working on software to do things humans want to do to affect the external world in some may. They're what Meir M. Lehman calls E-programs: programs to model human and social activities. The program becomes part of the world it models, e.g. air traffic control.
Acknowledging the inability of humans to understand what they want shows the folly of trying to get all the requirements "up front" before starting the coding.
I see a general trend for developers to want to put syntax and semantics back into the prompting. This whole idea that we can just get rid of formal languages entirely and replace them with natural languages isn't going to pan out; we have formal languages for a reason -- you don't have to guess the magic spell that will cause the output you want, you can get the output you want because you know how the system works. It's a much less frustrating and consistent way of dealing with computers unless you fancy yourself a bureaucrat.
If this actually happens, I would imagine that talking to an LLM would be a combination of formal syntax and natural language, so the task will look more like software engineering than black magic.
"This whole idea that we can just get rid of formal languages entirely and replace them with natural languages isn't going to pan out; we have formal languages for a reason -- you don't have to guess the magic spell that will cause the output you want, you can get the output you want because you know how the system works"
This reminds me of Inform 7 (where you can use "natural language" to program text adventure games) and "visual" programming languages like Pure Data.
They are fine for simple things, but when you want something complex or want to debug something they can become a nightmare.
LLMs are currently better in many ways than Inform 7, and they'll likely get better still, but there will likely still be a role for formal languages. Fortunately, LLMs can take formal language as input as well, and with the addition of plugins, they'll be able to execute programs written in formal languages.
I thought all the time I spent on interactive fiction was wasted, but it turns out Inform 7-ish NL syntax lends itself well to building minigames in GPT.
> Please function as a TTRPG engine.
> All characters start with 100 tokens representing LIFE, which can range from 0 to 100.
> Health scales with LIFE. At 0 LIFE, you are dead. At 100 LIFE, you are in peak physical condition.
> This reminds me of Inform 7 (where you can use "natural language" to program text adventure games)
The problem (at least, the problem I had) with Inform 7 is that it's not really natural language; it still has a strict formal grammar defining its syntax, it's just that the grammar has lots of "synonyms"; multiple syntaxes for the same thing. But it's still following strict rules, and those rules can't cover everything. This leads to situations where you write some "plain English" and it seemingly understands it, but then you write tiny variation and it throws up a syntax error.
> I have to do “prompt engineering” when I talk to other people all the time
Pulling from my hardware background...
When you design with FPGA's and write code using HDL (Hardware Description Language; Verilog, VHDL) you learn to write code the compile will use to infer the hardware structures you are after. Despite what it may seem, you are not programming, you are describing hardware and, in some cases, use use idioms or structures that will result in the desired outcome.
In some ways prompt engineering can be thought of as learning how to cause the AI to deliver the desired result using the most efficient prompt text.
Prompt engineering now (as I understand it) is you're refining and refining the prompt itself. Perhaps what we might see is you give it a prompt then just keep adjusting the response dynamically, and then it can remember the entire concept as a saved prompt.
I already use ChatGPT like this. After every good outcome I say something like "modify my original prompt to include all the things we learned in this chat session". It has mixed results, context window being the biggest factor I guess. Longer conversations (where I am trying to write stories specially) do not produce good prompts as it becomes super specific even if I tell it to generalize specific things.
Edit: I may have misunderstood. You instruct it to modify the original prompt of that session to include what you learned? Do you ask it to repeat the new prompt? This is interesting, but it wouldn't help for any future sessions.
---
It has zero effect. Your prompt consists of the OpenAI prompt + your conversation so far. There is no cross-prompt learning, knowledge or intelligence [except for, in the future, what OpenAI decides to include in model training of new releases]. The main reason for this is there are a limited number of tokens that can be used as the prompt. If you "include everything we learned in this chat session" it would have to include the entire chat session as part of all future prompts and you would quickly run out of tokens for the prompt. Training happens on a corpus, but not during regular use. Perceived "learning" is just context provided by the session-limited prompt.
I thought that was understood. I basically ask it to improve my initial prompt, like asking it how could I have asked this better. Sometimes it gets good results, like when it taught me how to pose formatting to it. I did not know how to make it output in a certain user defined format, it never occured to me that I can just give it an example, the modified prompt taught me that trick. Now it's super common everywhere and I think is the basis of a lot of langchain.
I think the intention is to eliminate the prompt component entirely. This would shift AI from "generative autofill" to a "repository of truth". The former is useless for most business applications. The latter is the holy grail of product.
I think it's easiest to compare writing prompts to writing SQL. You don't write SQL on a regular basis to interact with products (and with ORMs not even directly in your code anymore), it's been abstracted in many ways.
How can prompt engineering not be a thing so long as prompts are needed? If a different range of inputs produces a different range of outputs how could there be no room for nuances? Or how is intent engineering really any different? It’s all based off an prompt input right?
I think part of this comes from the fact that the same input will produce a different output.
Prompt engineering will be a thing, but I don't think it will be as prevalent as people think it will be. Look at projects like langchain. To me its biggest value is the library of standard prompts it provides. So I think prompt engineering will probably be a "niche" job the same way C programming is a "niche" job. Its super specialized and most people doing C programming are also experts in specific architectures they are programming in.
Saying that LangChain removes the value of learning to write prompts sounds to me like saying that the existence of ORMs removes the value of learning SQL.
I never said langchain "removes the value of learning to write prompts". My point is it abstracts it out enough that not everyone working with LLMs will need to know how to do it at a very high level. Just like most programmers can't write assembly/C, but we have tools which abstract it out so that experts can write/generate it for us. I don't know about SQL and ORM to respond to your analogy.
> the existence of ORMs removes the value of learning SQL.
No that SQL has no value, but this is exactly what ORMs do for a lot of people. They stay in their language instead of having to learn SQL (not saying this is ideal).
Yeah, while all the other stuff is important, prompts are a big deal, especially for generative image tasks. They need to be just right for the checkpoint, but if you nail it, you can get an awesome image straight away. Sadly, most people just give up too quickly.
But with ChatGPT, even if you give it a terrible prompt, it can still "get" what you're trying to say. You can keep chatting until you get the image you want. It's way easier than with Txt2img, which can be pretty unforgiving.
And those courses? Total scam, don't bother with them.
I am likely wrong here, but I thought the term “prompt engineering” was also related to the act of building a system around generating a prompt dynamically based on limited input? Eg not just trial and error over single prompts but using broader techniques like in-context learning, chain of thought reasoning, other AI models (BERT), vector DB etc to build prompts to send to the LLM? I’m likely wrong cause I can’t remember where I read this definition, but IMO it makes more sense that it relates to building systems around promoting, rather than how a single model reacts to very limited circumstances. Models etc are going to change pretty regularly, tiny adjustments in wording will change along with it, so seem to have limited ROI. Broader techniques will become wide spread pretty quickly, how you use them all together in a system I think will be more important over time, but I don’t know, everything is moving so quickly shrugs.
I don't think prompt engineering is temporary as long as we are using LLMs. Prompt engineering is about creating workflows that squeeze the most impact out of these tools. I don't see how that will go away.
the expectation is that we won't be prompting this models the way we do now down the road. As in: prompting is the command line of LLM, at some point we'll get the equivalent of a GUI (either because we can be clueless on how to prompt because the LLM is so good, or it's so good at eliciting your requirements, or because there is no prompt at all and you interface with the LLM completely differently)
You could foresee that under the covers there is always going to be some prompting but it's going to be performed rarely by few people?
I agree that there will be new and interesting abstractions for prompting, but I have this feeling that the promise of LLMs for the foreseeable future is to apply them to new and unique business cases. I think this will always require interacting with them at a lower level to some extent.
My experience is that LLMs are really good at interpreting prompts, so you really don't have to craft them any way to get "better" results.
The most important part is that the prompt has the necessary information so that it can be interpreted correctly. The other important part is, if you're trying to get a response you can feed to a computer (e.g. raw JSON), you need to really clearly specify this. And even then the current LLMs are really bad at stopping or providing invalid output; so bad, we may need to create another type of language model which takes LLM "English" output and converts it into raw data.
LLMs are (or at least GPT-4 is) also really good at selective attention. Even if you slip in a subtle detail the model is good at picking it up.
> My experience is that LLMs are really good at interpreting prompts, so you really don't have to craft them any way to get "better" results.
That's my impression. I just write what I want chatGPT to do, and eventually refine according to the answers. Why has "engineering" been added to "prompt"? Why not simply "prompting writing"? Or "Better prompt writing"? (is it not simply equivalent to "how to write better")?
Finding the right wording and constraints to include is incredibly important when you’re submitting prompts through the API. The results have to be repeatable and conform to a template in order to be usable. Doubly so when using 3.5 which isn’t as good at guessing your intent. 3.5 is the only option for those without API access to 4 or for frequent or large requests that would cost a lot if using 4.
Perhaps there could be something like a formal language we could use to interface with the models, to help assuage the trickiness of natural language prompting.
We could declare certain pieces of the prompt as "variables," and certain defined operations that have proven deterministic output as "functions." All so that what we want the computer to do can be rigorously defined and tested.
You can write your prompts in code or formal logic or whatever if you want, assuming you’re capable of expressing your meaning sufficiently clearly with those languages too.
But the LLM won't interpret it formally; it doesn't know the semantics of any of the programming languages it can generate code for, it just "knows" that certain patterns occur more or less frequently. You can prompt it with invalid code and it will still sometimes give you a meaningful response.
Calling it "prompt engineering" feels more like a misnomer. It sounds too grandiose and technical, like calling someone a 'customer success architect'. It has an "artificially important" ring to it.
One feature if LLMs can provide: It can give feedback about how good the prompt was from the user (e.g how clear or less ambiguous was it for LLM to understand what user is asking or some other degree)
"Ah, such a clear and easy prompt for me to follow. Thank you, user."
There might be a tiny ghost in an LLM machine, but it's very unlikely that it has the degree of self-awareness required to assess whether or not a prompt was, in its own experience, unambiguous. It's a reflection of its training data, and it has no or little sense of self.
But it could, perhaps, compute something like the degrees of freedom the prompt allowed. If, in choosing how to complete the prompt, it had to choose between a huge number of different responses, perhaps that's an indication that the prompt is not precise enough? (Of course, even the most precise prompt is still going to allow for a huge number of variations in response, so you'd have to scale the actual numbers down quite a bit for them to be useful.)
Most people using gpt directly will want to use gpt-turbo (very cheap!) or gpt4 (best model). Neither of those are currently available on the completion api endpoints, only on the chat endpoints. "Prompt" is the natural word.
99? That seems generous, but more likely I'm in the majority.
On a technical note, there isn't abstract comprehension going on, like the discovery that chess dosen't require intelligence, what happens when creative people recognize they don't either ;p
> Being “good at prompting” is a temporary state of affairs.
Valid point. Sam A said he thinks prompt engineering won't even be a thing in 5 years. And in the shorter term, prompts that work with a specific model like GPT-4 might not work well for future models, or even updates to the same model.
That said, I see prompt engineering as the beginning of a new paradigm of "intent engineering"--where developers use AI to understand and anticipate user intent with minimal user effort. It'll be fun to see what that looks like in 5 years.
[0] https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg
[1] https://twitter.com/_akhaliq