Effective prompting of large language models is not about memorizing hacks from YouTube. It is about understanding how transformers actually process input and structuring your communication accordingly. A recent guide from the Mira OS team lays out four practical pillars for getting better results from both reasoning and non-reasoning models.
The four pillars
1. Articulate your intent clearly using domain-specific language
Plan the conversation before you start. Know your intent, task, or question, and identify what clarifying inputs will get you closer to the answer. These models are probabilistic — tighten the probability cone of the next turn's tokens by asking questions where you expect the answer to land in the neighborhood you want.
Avoid dumping large amounts of waterfall context early in the conversation. The model attaches to and interprets every word you use. More words increase the chance of misinterpretation. The recommended approach: pretend you are an eccentric millionaire dictating a letter to an unpaid intern. Short, direct, specific.
For reasoning models (like Qwen 3.6 or Gemma 4), this approach is especially effective. Mira's system default model has been changed from Opus 4.6 to Gemma4:26bA4b because it is better. The author reports coding nearly exclusively with Qwen 3.6 now because it is comparable and can be run entirely for free on a local computer.
Non-reasoning models inside LLM pipelines must be treated differently. Prompt engineering for small non-reasoning models is closer to compiler design than to writing. You are programming a pattern matcher, not persuading a reasoning agent. Every token is an instruction, every example is a template, every delimiter is a structural signal. Use /nothink to suppress reasoning when you need predictable, deterministic output. IBM Granite 4.1 is cited as a good example of a boring, efficient transformer for tasks like parsing a list and extracting JSON.
2. Railroad the model into going where you want in conversation
Large language models do not think linearly. They load everything into their mind at once, then dump a response. Prompting is effectively zero-sum — every irrelevant token is another surface the model can grab onto instead of the thing you actually care about.
Lost-in-the-middle is real but not about the context window per se — it is about the attention window. If you saturate the tokens the model is attending to with irrelevant junk, it cannot find what you are looking for. The shorter the total context, the better the odds that attention will look in the right place.
The author built an application called TeaLeaves for visualizing per-layer attention on a live heatmap. With poorly formed directions, the model keeps checking back at tokens