Prompting Techniques That Squeeze the Best Out of Your LLM

From the simplest to the most advanced, instruct your GPT for the best generation.


For given input, you want the model to correctly generate output. There is a natural path from the simplest most crude to the most advanced fine-tuning of the model. This post guides your through these techniques in a simple way.

Large language models (LLMs) like ChatGPT (GPT-3), Claude, Bard are trained to predict text continuation with extra tuning for the following conversations and instructions (RLHF). We steer the model with a small additional textual context, such that it learns in context without a large amount of training data. This additional context is called prompt. Systematic development of prompts using metric evaluation is called prompt engineering.

Trade-offs in Prompting

  • Longer prompts are the more expensive in terms of latency and compute. For example, more examples provided, the longer prompt. Training a specific model or selecting samples intelligently are possible solutions.
  • If the model changes the prompt may stop being optimal, in that case, there is little point over-optimizing it. For example, ChatGPT or GPT-4 is often changed by OpenAI. The models are meant to be general not specific to your problem.
  • Control guardrails vs creative hallucinations. Certain prompts may be prone to more hallucinations over others.
  • Prompts are a crude tool without nuance and can be “over prompted” (prompt injection) with user’s own instructions, whereas fine-tuning requires more initial investment, data, and is complicated.

Task Instruction

Also called Zero-Shot Prompting.

Describe the task:

  • intent (detect product review sentiment)
  • audience (5 year old)
  • persona (expert marketer)
  • specific and precise terms, e.g., avoiding generic word “not”.

GPT-3 Zero-shot

Input-Output Examples

Also called One-shot, Few-Shot prompting.

Provide examples such that

  • Changing order of examples can change results. Recent examples are more likely to be reproduced.
  • Representatively ordered examples at least random - For multiple choice outputs you may want to debias the model to prevent repeating the last answer.
  • Similar or relevant examples to the input - For an input search KNN clustering for finding semantically similar examples to provide into the prompt.
  • Diverse examples between each other - If you have static prompt, instead select diverse examples with clustering.
  • Difficult to answer examples - select most difficult questions for prompt based on difficulty to answer by the model.

Language Models are Few-Shot Learners

Reasoning in Steps

Also called Chain-of-Thought (CoT) Prompting.

Your step-by-step instruction creates a momentum such that the model generates a text that guides it towards the correct answer. The reasoning steps increase interpretability. Append instruction “Let’s think step by step.” or provide reasoning examples. For example, multistep arithmetic, commonsense logical reasoning. Model’s ability to use CoT increases with model size (see PaLM and its ability to explain jokes).

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Majority Vote Reasoning Steps

Also called Self-consistency with Chain-of-Thought (CoT-SC).

Generate multiple reasoning paths (chains of thought), then return the most common answer.

Also called Tree of Thoughts Problem Solving (ToT).

Generate explicitly decomposable thoughts, evaluate progress of each unfinished thought chain, and efficiency explore with an search algorithm. This has analogies to AlphaZero for playing chess.

Criticism is that the evidence is low with only 3 toy examples, additional model generation is required for evaluation operations, and the technique requires additional problem-specific human inputs.

Tree of Thoughts Problem Solving comparison with Chain-of-Thought

Thought Decomposition in ToT

Designed problem-specific meaningful thought size and separation. For example a paragraph, or an equation.

three of thought input, output, thoughts

Thought and Value Generation in ToT

Designed problem-specific prompts thought-prompt. Propose or sample generate depending on the output size.

Evaluation-prompt in ToT

Designed problem-specific prompt for reflecting on the thoughts “state”. Either:

  • Value of state: Generate value of a specific step or “chain”.
  • Vote: across states: Based on all steps, the model compares and select the most promising.

Search Algorithm in ToT

Explore the most promising paths until solution, bad state, or depth limit:

  • Breadth-first search (BFS): Keep only most promising states, then generate to deeper the level for all, prune. Iterate.
  • Depth-first search (DFS): Keep going deeper until solution, bad state, or depth limit. Then backtrack, exclude already visited.

Examples in ToT

Game of 24

  • Game of 24 is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. For example, given input “4 9 10 13”, a solution output could be “(10 - 4) * (13 - 9) = 24”. We decompose by choosing the numbers from the left to the right.

Tree of Thought ToT: Game of 24 results

Tree of Thought ToT: Game of 24 results

Creative Writing

Tree of Thought ToT: Creative Writing

Graph of Thoughts

Tree of Thoughts with the human-specified ability to combine thoughts on top of Scoring & Ranking Thoughts. Core idea in both methods is reusing the already generated thoughts, but graph of thoughts has aggregation ability.

Generating Optimal Prompts

Models can be used to generate their own optimal prompts. For example, Large Language Models as Optimizers.

Tool Use

Models can use external tools by generating API calls, when it is advantageous. For example, if there is a question for some sort of calculation. Toolformer method can use a small training sets, you can teach it to call a calculate function, which it can use to do the calculation for it. It will get the results as into the text and instead of predicting the function output, it would get the output from the tool. It would sort of stop predicting for a couple tokens and get the result. With this you can improve actual performance on dedicated tasks for, you can do retrieval for question answering.

  • TODO

Fine-Tuning Training

Nuanced behavior and stronger prompt injection protection can be only trained via fine-tuning. When we have enough data and compute, we can fine-tune the model weights to increase performance.

Parameter Efficient Methods

Cheaper to train, and switch between, and help to prevent catastrophic forgetting. Can help against catastrophic forgetting also.


  • Soft prompts: training a section of input sequence embeddings.
  • Adapters


  • LoRA: Low-Rank Adaptation

Other Resources

Get more information from leading model providers:

Created on 08 Jun 2023. Updated on: 08 Jun 2023.
Thank you

About Vaclav Kosar How many days left in this quarter? Twitter Bullet Points to Copy & Paste Averaging Stopwatch Privacy Policy
Copyright © Vaclav Kosar. All rights reserved. Not investment, financial, medical, or any other advice. No guarantee of information accuracy.