Back to Blog

Understanding the Anatomies of LLM Prompts: How To Structure Your Prompts To Get Better LLM Responses

While prompt engineering may or may not be an actual discipline, prompts are what we use to communicate with large language models (LLMs), such as OpenAI’s GPT-4 and Meta’s Llama 2. There are various ways to structure your prompts; some may be better suited for certain use cases. The better we can communicate with LLMs, the higher the probability of getting more relevant and consistent output from LLMs. 

In this article, we will explore:

  • LLM settings that affect the output
  • The distinct elements that make up a prompt
  • Various prompting techniques
  • Some limitations of prompting

AI ML Technical Leadership program

Large Language Models Settings

The available settings for adjustment will vary across LLMs. However, a couple of setting options are more universal across LLMs. LLM settings can be sensitive to change. The key is to begin with the default settings and progressively change them slightly until desired outcomes are achieved. Adjusting these settings can help guide the model towards the desired output.

Temperature

The temperature setting influences the predictability of an LLM's output. Essentially, a lower temperature yields a more predictable outcome while a higher temperature yields a less predictable one. Hence, when generating LLM output for more fact-based and conservative use cases, choose a lower temperature. For creativity and variety, increase the temperature.

When predicting the next word, or token, in a sentence, the LLM can select from a pool of tokens, each with an associated probability. Reading the phrase “The ocean is,” the token “blue” or “deep” is most likely to follow, while the token “furry” or “thin” is less likely to follow. When the temperature is low, the LLM picks the more probable token, say “blue.” Conversely, when the temperature is high, the LLM will consider picking less probable tokens, say “thin.” The temperature setting changes the shape of the probability distribution of the available tokens. As the temperature rises, the differences in the tokens' probability are reduced, so the LLM will consider selecting tokens that seemed less probable before.

A temperature of zero theoretically makes the LLM deterministic, meaning you get the same output every time with the same input, by limiting the LLM to only pick the word with the highest probability. In practice, this may not always be the case due to randomness in calculations being performed by the LLM's GPU (graphics processing unit).

Top-K and Top-P

Aside from temperature, Top-K and Top-P are two other settings for controlling the randomness of the LLM’s output by specifying how to select the next word or token. These settings are part of prompt engineering techniques used to refine AI responses.

Extending the example of “The ocean is”:

  • “blue” is 50%

  • “deep” is 25%

  • “furry” is 2%

  • “thin” is 1.8%

Top-K instructs the LLM to select the next token from the top “K” tokens in the list, sorted from highest to lowest probability. K specifies the size of the choices. The tokens outside of these choices are ignored. If K is set to 1, then the LLM will only pick the top token “blue.” If K is set to 2, then the LLM will randomly pick between “blue” and “deep.” Thus, increasing K will lead to more variety in the LLM response.

Top-P selects from the top tokens, sorted from highest to lowest probability, based on the sum of their probabilities. P specifies the threshold that the combined probabilities must exceed or equate. From the above example, if P is set to 0.5, then the LLM will pick “blue” which already meets the 50% threshold. If P is set to 0.70, then the LLM will randomly pick between “blue” and “deep” (50% + 25% exceeds the threshold). Hence, increasing P will lead to more diversity in the LLM response.

The Top-P method tends to be more dynamic because it considers the cumulative probabilities rather than the probabilities of individual tokens. The number of tokens in the selection set dynamically increases or decreases based on the next token’s probability distribution.

Top-K and Top-P can be used together to filter out low-ranked tokens and allow for some dynamic selection. Say K is set to 3 and P is set to 0.70, then the selection pool is first restricted to the top 3 tokens: “blue,” “deep,” and “furry.” The sampling tokens are further limited to “blue” and “deep” as their combined probabilities (50% + 25%) exceed the P threshold.

Maximum Length

The maximum length setting specifies the total number of tokens that the LLM can generate. This prevents the LLM from generating responses that are overly long and irrelevant. Depending on the LLM's pricing, restricting the length of the output tokens may lower the cost of API calls to the LLM. 

Role

Role is not a standard LLM setting. OpenAI's API for text generation models (e.g. GPT-4) accepts a list of messages as input and returns a model-generated message as output. The messages take on specific roles. The “role” property can take one of three values:

  • system - the high-level instructions to guide the model's behavior, such as specifying the personality of the assistant

  • user - the requests or queries for the assistant to respond to

  • assistant - the model's responses (based on the user messages)

Screenshot 2024-03-20 at 5.26.44 p.m.

A sample Chat Completions API call (accessed 2/22/2024). Source: https://platform.openai.com/docs/guides/text-generation/chat-completions-api?lang=node.js 

Elements of a Prompt

Screenshot 2024-03-22 at 9.43.50 a.m.

A table of prompt elements and their importance. Source: Author, based on Master the Perfect ChatGPT Prompt Formula (in just 8 minutes)! and https://www.youtube.com/watch?v=jC4v5AS4RIM 

 

Let’s go over the distinct elements that make up a prompt. Note that not all elements need to be used in every prompt. Additionally, designing effective natural language prompts is crucial for guiding LLMs in producing accurate outputs.

Instruction

The request or task that the LLM should perform. This is usually an action verb. For example, we can instruct the model to “analyze” collected feedback, “summarize” a report, “classify” sentiment, or “generate” headlines.

It is important to use separate instructions for clarity. This helps in effectively organizing the prompt by delineating different types of content such as instructions, examples, and questions.

User Input

The data that the LLM should perform the task on. If our instruction is to “classify the following text as positive, neutral, or negative,” then we need to provide the input text for the LLM to classify. For example, “The salesperson was helpful.”

Context

External or additional information can steer the model to more relevant responses by narrowing down the endless possibilities. For instance, if we are instructing the LLM to “generate 3 business names,” the LLM has no clue what type of business we are talking about. So we can provide context to the instruction to be more specific, such as “generate 3 business names for a social media agency targeting trendy, anime shops.”

Exemplar

Examples that are provided to guide the LLM. Jeff Su provided the following exemplar prompt: “Rewrite the provided bullet point using this structure: I accomplished X by measure Y resulting in Z.” We then give an actual example that follows the structure: “I onboarded new clients by 2 days faster on average resulting in a 23% increase in client satisfaction score.”

Persona

Specification on who the LLM should be when carrying out the instruction. The persona could be a seasoned social marketing expert with 12 years of experience who is witty. Or we can name a specific famous individual (real or fictional), such as Steve Jobs or Wonder Woman.

Output Indicator

Specification for the format and style of the LLM output.

Format

We can tell the LLM to “generate 3 business names that are composed of 2 words.” Jeff Su provided the following format prompt: “Analyze collected feedback based on the responsible team (sales, support, or product) and output in table format with column headers: Feedback, Team, and Priority.” Some common formats include emails, bullet points, tables, code blocks, markdown, JSON, and paragraphs.

Tone

The voice or style of the output. For instance, friendly, professional, funny, empathetic, etc.

Prompting Techniques

Static Prompts

A static prompt contains just “plain text with no templating, injection, or external information.” Exemplar is one of the prompt elements. We can provide the LLM with zero, one, or a few examples to help it learn. In general, including one or a few examples can improve the quality of the LLM’s responses, especially for more complex tasks. The key is coming up with informative, concise examples (2–5) that guide rather than overwhelm the model.

Zero-Shot Prompting

As modern LLMs are trained with a massive amount of data, they are capable of performing some tasks even without any example provided in the prompt. This is known as a zero-shot prompt and is the simplest type of prompt.

 

Screenshot 2024-03-20 at 5.37.54 p.m.

GPT-3.5-Turbo-0125 response to a zero-shot prompt (accessed 2/27/2024). Source: Author, based on https://www.promptingguide.ai/techniques/zeroshot

 

In the preceding zero-shot prompt example, although the prompt didn’t include any examples of input text and expected classification, the LLM was able to comprehend “sentiment” and classified it as neutral.

One-Shot Prompting

Here we try a one-shot prompt presented in OpenAI’s prompt engineering guide where a single example is offered. You can see the difference in the output quality in the one-shot prompt versus the zero-shot prompt with the same LLM settings. The one-shot prompt response followed the style of the provided example while the zero-shot prompt gave a matter-of-fact response.

Screenshot 2024-03-20 at 5.40.34 p.m.

Example of a one-shot prompt (accessed 2/24/2024). Source: https://platform.openai.com/docs/guides/prompt-engineering/tactic-provide-examples 

Screenshot 2024-03-20 at 5.41.54 p.m.

GPT-3.5-Turbo-0125 response to a one-shot prompt (accessed 2/24/2024). Source: Author, based on https://platform.openai.com/docs/guides/prompt-engineering/tactic-provide-examples 

Screenshot 2024-03-20 at 5.42.12 p.m.

GPT-3.5-Turbo-0125 response to a zero-shot prompt (accessed 2/24/2024). Source: Author 

Few-Shot Prompting

With a few-shot prompt, several examples that demonstrate a new concept, skill, or domain are provided to the LLM. The model then adapts and applies that knowledge to generate output. A common use case for the few-shot technique is when we need the output to be structured in a specific way that is hard to describe in words.

The following example is a few-shot prompt tasking the LLM to extract a list of names and occupations in “First Last [OCCUPATION]” format from each news article without explicit instruction.

Screenshot 2024-03-20 at 5.45.55 p.m.

Example of a few-shot prompt (accessed 2/27/2024). Source: Author, based on https://learnprompting.org/docs/basics/few_shot 

Screenshot 2024-03-20 at 5.46.16 p.m.

GPT-3.5-Turbo-0125 response to a few-shot prompt (accessed 2/27/2024). Source: Author, based on https://learnprompting.org/docs/basics/few_shot 

While a useful technique, few-shot prompting does have its limitations. It may not provide adequate context for complex tasks, resulting in possible inaccurate LLM responses. This technique is also highly dependent on the quality and relevance of the provided examples. This may require human input hence making the technique harder to scale.

Prompt Templates

Prompt templating converts a static prompt into a template by using placeholders (variables) to substitute for important inputs (values). The placeholders are then injected with the actual values at runtime before sending the prompt over to the LLM. With templating, LLM prompts can be programmed, stored, and reused. Prompt templates are useful when we are programmatically composing and executing prompts. The key is creating a structure that is clear and concise.

Screenshot 2024-03-20 at 5.46.44 p.m.

Example of a prompt template in Langchain JS with product placeholder (accessed 2/25/2024). Source: https://js.langchain.com/docs/modules/model_io/prompts/quick_start 

Contextual Prompts

Pre-trained LLMs, such as GPT-4 and Llama 2, were trained on publicly available online data and licensed datasets with a cutoff date, so they are unaware of more recent events and private data. To provide additional source knowledge without altering the underlying LLM, the Retrieval Augmented Generation (RAG) technique can be used. Any extra information outside of the LLM may be retrieved and passed along as part of the input context. Contextual data may come from multiple sources, such as vector or SQL databases, internal or external APIs, and document repositories. Contextual prompts are applicable when we want the LLM to perform a task requiring proprietary or recent information. The key is providing the right contextual information at the right time to the LLM.

The following example from OpenAI asks about the 2022 Winter Olympics, which is outside of the model’s knowledge base. Hence the LLM couldn’t answer the question. However, when provided with additional background, it provided the correct answer.

Screenshot 2024-03-20 at 5.47.02 p.m.

Example of GPT-4 response when asked a question outside its knowledge base (accessed 2/25/2024). Source: Author, based on https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb 

Screenshot 2024-03-27 at 11.27.01 a.m.

Example of GPT-4 response when asked a question outside its knowledge base provided with additional contextual information (accessed 2/25/2024). Source: Author, based on https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb 

Chain-of-Thought (CoT) Prompting

CoT prompting is a technique that steers the LLM to follow a reasoning process when dealing with complex problems. It resembles the few-shot prompting method but differs in that the examples provided illustrate step-by-step thinking, not just the end results.

CoT prompting is best suited for tasks involving advanced reasoning, like coding, mathematics, and common sense. The key is providing clear and specific steps to help the model fill in the gaps in its reasoning.

Screenshot 2024-03-20 at 5.55.55 p.m.

Example of one-shot prompting versus CoT prompting. Source: https://arxiv.org/abs/2201.11903  

The following example didn’t use the CoT prompting technique and the LLM correctly added the odd numbers to 41, but failed to detect that the assertive statement “The odd numbers in this group add up to an even number” is false.

Screenshot 2024-03-20 at 5.56.16 p.m.

GPT-3.5-Turbo-0125 response to an arithmetic query without CoT prompt (accessed 2/26/2024). Source: Author, based on https://www.promptingguide.ai/techniques/cot 

With the CoT prompting technique, the LLM correctly added the odd numbers to 41 and also detected that the assertive statement “The odd numbers in this group add up to an even number” is false.

Screenshot 2024-03-20 at 5.56.34 p.m.

GPT-3.5-Turbo-0125 response to a CoT prompt (accessed 2/26/2024). Source: Author, based on https://www.promptingguide.ai/techniques/cot 

Zero-Shot Chain-of-Thought (Zero-Shot CoT) Prompting

The Zero-Shot CoT prompting technique doesn’t include any intermediary steps. It simply adds “Let’s think step by step” to the original prompt. This additional instruction cues the model to break down complex subjects. We can try the Zero-Shot CoT prompting approach before the CoT prompting approach. This is helpful when we don’t have many examples to include.

Using the previous arithmetic query with the Zero-Shot CoT prompting technique, the LLM correctly added the odd numbers to 41 and also detected that the assertive statement “The odd numbers in this group add up to an even number” is incorrect. However, without the examples, it didn’t respond in a consistent output format.

Screenshot 2024-03-20 at 6.03.03 p.m.

GPT-3.5-Turbo response to a Zero-Shot CoT prompt (accessed 2/26/2024). Source: Author, based on https://www.promptingguide.ai/techniques/cot 

However, LLMs are evolving and getting more capable. Given just a zero-shot prompt, GPT-4 seems to apply the “Let’s think step by step” automatically in its response. However, the response contained contradictory statements of “The statement is true” and “So, the statement is incorrect.”

Screenshot 2024-03-20 at 6.03.18 p.m.

GPT-4 response to a Zero-Shot prompt (accessed 2/26/2024). Source: Author, based on https://www.promptingguide.ai/techniques/cot 

Automatic Chain-of-Thought (Auto-CoT) Prompting

The CoT prompting technique requires human effort in creating examples demonstrating the intermediary steps, which can introduce mistakes and suboptimal solutions. The Auto-CoT prompting technique automatically generates the intermediate reasoning steps. Auto-CoT prompting requires more setup to use and is an advanced technique.

Auto-CoT has two main stages:

  1. Question Clustering: Partition a diverse dataset of questions into a few clusters.

  2. Demonstration Sampling: Pick a representative question from each cluster and apply the Zero-Shot CoT prompting technique to auto-generate its reasoning path.

Screenshot 2024-03-20 at 6.03.33 p.m.

Illustration of Auto-CoT process. Source: https://arxiv.org/abs/2210.03493 

Multimodal Chain-of-Thought (Multimodal-CoT) Prompting

As multimodal LLMs, such as GPT-4V which accepts text and image as input, become accessible, the Multimodal-CoT prompting technique comes into play. Multimodal-CoT prompting utilizes words and pictures to demonstrate the reasoning steps. As this is an advanced technique, technical setup is required for Multimodal-CoT prompting.

Multimodal-CoT consists of two stages:

  1. Rationale Generation: Provide text and vision (image) inputs to the model to generate rationales.

  2. Answer Inference:  Append the original text input with the generated rationale to get the updated text input. Then provide the updated text input with the original vision input to the model to infer the answer.



Screenshot 2024-03-20 at 6.03.50 p.m.

Overview of Multimodal-CoT framework. Source: https://arxiv.org/abs/2302.00923 

Limitation of Chain-of-Thought Prompting

While the CoT prompting technique is helpful for complex reasoning tasks, the main limit is that “there is no guarantee of correct reasoning paths” given the lack of transparency into the model’s reasoning mechanism. Hence the reasoning paths may lead to right and wrong answers.

Tree of Thoughts (ToT) Prompting

The ToT framework mirrors the learning approach of trial and error, by exploring possible paths and self-evaluating along the way to reach an optimal solution. The model generates and maintains a tree of thoughts, where each thought represents an intermediate step in a sequence that solves a problem. Along the way, the model self-evaluates the thoughts and decides to stay on the current path or pick an alternative. Search algorithms, such as breadth-first search and depth-first search, allow the model to systematically explore thoughts with lookahead and backtracking.

Screenshot 2024-03-20 at 6.04.06 p.m.

A diagram comparing ToT with other prompting techniques. Source: https://arxiv.org/abs/2305.10601 

A diagram comparing ToT with other prompting techniques. Source: https://arxiv.org/abs/2305.10601

ToT prompting is derived from the ToT framework and is an advanced technique for tackling complex tasks, like mathematical and logical reasoning.

Here is an example complex question by Dave Hulbert:

“Bob is in the living room. He walks to the kitchen, carrying a cup. He puts a ball in the cup and carries the cup to the bedroom. He turns the cup upside down, then walks to the garden. He puts the cup down in the garden, then walks to the garage. Where is the ball?”

When using the Zero-Shot CoT technique with GPT-3.5-Turbo-0125, an incorrect answer was given that the ball was in the garden.

Screenshot 2024-03-20 at 6.15.21 p.m.

GPT-3.5-Turbo-0125 response to a Zero-Shot CoT prompt (accessed 2/26/2024). Source: Author, based on https://github.com/dave1010/tree-of-thought-prompting 

The key to the ToT prompting technique is crafting an effective ToT-based prompt. Dave Hubert proposed the following ToT-based prompt based on the ToT framework:

“Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realizes they’re wrong at any point then they leave. The question is…”

Providing Hubert’s ToT-based prompt to GPT-3.5-Turbo-0125 yielded the correct answer that the ball is in the bedroom.

Screenshot 2024-03-20 at 6.15.38 p.m.

GPT-3.5-Turbo-0125 response to a ToT prompt (accessed 2/26/2024). Source: Author, based on https://github.com/dave1010/tree-of-thought-prompting 

However, GPT-4 answered correctly with a zero-shot prompt that the ball is in the bedroom. So more experimentation and observation are needed to determine whether ToT prompting will be necessary as LLMs improve their reasoning capability.

Screenshot 2024-03-20 at 6.15.52 p.m.

GPT-4 response to a complex zero-shot prompt (accessed 2/26/2024). Source: Author, based on https://github.com/dave1010/tree-of-thought-prompting 

Limitations of Prompt Engineering

While prompting techniques are useful, there are limitations.

  • Prompting variances among different LLMs: Even with the same prompt, we saw different responses from GPT3.5-Turbo and GPT-4 in the earlier examples. GPT3.5-Turbo and GPT-4 are LLMs from the same company OpenAI. There may be even more variances when we execute the same prompt with LLMs created by other organizations, which have differences in their underlying architecture and training dataset. We will need to test and tweak prompts when using them with different LLMs.

  • Risks in designing prompts: In the prompting techniques covered in the earlier sections, we provided examples, contextual information, and intermediary steps to the LLMs. We have to be mindful of bias and inaccuracy in the data we provide to the LLMs. Although the organizations owning the LLMs may not use our provided data to train their LLMs, the biases and inaccuracies in our input data may propagate into the LLM outputs. Remember that garbage in, garbage out. (Note: At the time of writing, OpenAI states that “API and Playground requests will not be used to train our models.”)

More on LLM Prompt Engineering

What is LLM in prompt engineering?

LLM stands for Large Language Model, used in prompt engineering to generate responses based on structured input prompts, helping to guide the model toward accurate and relevant output.

What are system prompts in LLM?

System prompts guide the model's behavior. They define high-level instructions, such as setting the assistant's tone, expertise, or limitations, helping tailor responses in line with user needs.

What are the three types of prompt engineering?

The three common types of prompt engineering are zero-shot prompting, one-shot prompting, and few-shot prompting, each providing varying levels of examples to guide the LLM in generating accurate responses.

How do LLMs interpret prompts?

LLMs interpret prompts by analyzing the input, recognizing instructions, user data, and context, and then generating responses based on patterns learned during training, while applying relevant settings like temperature or max length

LLM Prompt Architecture: Next Steps

Mastering prompt structure is crucial to communicating effectively with LLMs and optimizing their responses. By understanding LLM settings, prompt elements, and advanced techniques like few-shot prompting and Chain-of-Thought, you can guide these models to deliver more accurate and consistent output.

However, prompt engineering is an iterative process. Different models, use cases, and complexities may require experimentation and adjustment. Always remember that clear, well-structured prompts reduce ambiguity and bias, leading to more reliable results.

As LLMs evolve, so will the techniques for prompting them. Stay adaptable and continue refining your approach to unlock the full potential of these powerful tools.

 

Diana Cheung (ex-LinkedIn software engineer, USC MBA, and Codesmith alum) is a technical writer on technology and business. She is an avid learner and has a soft spot for tea and meows.

 

AI ML Technical Leadership program