Teaching a Language Model to Draw
Bachelor's Thesis Faculty of Mathematics & Computer Science University of Bucharest · 2026

A text model learns to draw,
guided by a vision critic.

Abstract

An Artist language model writes an SVG sketch from a text prompt. It cannot see. A separate Critic vision model looks at the rendered drawing and describes, in plain language, what is wrong. The Artist revises and the loop repeats. This work investigates whether a vision model can measurably improve the output of a text-only model with which it shares no weights, no architecture, and no training data.

Keywords. multimodal critique; SVG generation; iterative refinement; text-only drawing

prompt → Artist → SVG → render → Critic → feedback (1)

typed feedback graph prompt p s_t : SVG I_t = render(s_t) r_t, v_t, f_{t+1} feedback f_t Artist language model text -> SVG Renderer deterministic SVG -> pixels Critic vision model image -> text p, f_t -> A -> s_t render(s_t) -> C -> f_{t+1}
Figure 1. Typed agent graph. The Artist writes SVG, the renderer produces pixels, and the Critic returns only score, verdict, and prose feedback for the next iteration.

st = A(p, ft),   (rt, vt, ft+1) = C(render(st)) (2)

The interface is intentionally narrow: the Artist emits SVG, while the Critic emits only scalar judgment and prose.

Algorithm 1. Critic-guided SVG refinement.
  1. 1Input: text prompt p, maximum iterations T
  2. 2Initialize feedback f as the empty string.
  3. 3for t = 1, ..., T do
  4. 4Artist writes SVG from p and previous feedback f.
  5. 5Render SVG to a raster image.
  6. 6Critic assigns score, verdict, and natural-language feedback.
  7. 7if verdict is accept then return drawing.
  8. 8Set f to the Critic feedback.
  9. 9end for
  10. 10return final drawing and Critic report.

Note. The Artist never sees the image; it only ever reads the Critic’s words. The run halts when the Critic accepts the drawing or the iteration cap is reached.

A representative pass of the loop. The Artist sketches the subject stroke by stroke; the Critic looks at the result and answers in plain words. This is iteration three of a run on “a cat” — the drawing now reads clearly, but still needs one more refinement pass.

(a) Artist sketch s3, rendered from SVG path data.

Prompt “a cat”
Iteration 3 / 4
Score 8 / 10
Verdict revise

(b) Critic output returned to the Artist as text.

Figure 2. Representative iteration from a cat prompt. The left panel is the rendered Artist SVG; the right panel is the Critic response used for the next refinement pass.

Every iteration is scored 0-10 by the Critic, which also returns a verdict: revise or accept. The loop is useful when the score rises under critique while the final verdict changes state.

Result 1. In the representative run, r = (4, 6, 7, 9) and v4 = accept; prose feedback is the only signal returned to the Artist.
10 5 0 1 2 3 4 score r_t iteration t r_1 = 4 r_2 = 6 r_3 = 7 r_4 = 9 v_t revise revise revise accept
Figure 3. Pgfplots-style score trace for one refinement run. The curve is redrawn in reading order; the final marker denotes the accepted drawing.
  1. Source code and local reproduction instructions. learn-to-draw-step-by-step , 2026.

This page also serves as a small reproduction cell. Live runs stream stroke by stroke from the inference backend.

prompt
iterations
4
backend
source
live

backend: checkingLooking for backend

Local reproduction command:

make local ARTIST=gemma3:27b CRITIC=blaifa/InternVL3_5:8b

Full setup is in the setup notes ↗.