When an image is flagged as AI-generated, the natural follow-up question is: which model made it? This matters more than it might seem. Different models have different use cases, different user communities, and different forensic footprints. A Midjourney image circulating as a news photograph is a different kind of problem than a DALL-E 3 image posted on social media — and the tools to identify them differ accordingly.
This guide breaks down the distinct artifact signatures left by Midjourney and DALL-E 3, covering everything from pixel-level texture patterns to metadata and content credential support.
How the Two Models Work (and Why It Matters)
Understanding the artifacts starts with understanding the architecture.
Midjourney uses a proprietary diffusion model trained heavily on artistic and stylized imagery. It operates through Discord, renders images at high aesthetic quality, and tends to prioritize visual appeal over photorealism. Its generations are optimized for “wow factor” — dramatic lighting, rich color grading, painterly textures.
DALL-E 3, developed by OpenAI, is built on a diffusion architecture fine-tuned with human feedback and integrated into ChatGPT. It is optimized for instruction-following, meaning it tries to render exactly what the prompt describes. It also supports the C2PA (Coalition for Content Provenance and Authenticity) standard, embedding cryptographic provenance into its outputs.
These architectural differences leave measurable traces.
Visual Artifact Comparison
| Feature | Midjourney | DALL-E 3 |
|---|---|---|
| Texture handling | Painterly, over-smoothed surfaces; pores, fabric weave often missing | More realistic micro-texture, but inconsistent at edges |
| Edge rendering | Soft, often slightly blurred edges between subjects and backgrounds | Sharper edges, but halos visible at high contrast boundaries |
| Lighting | Dramatic, often cinematic; light sources may not be physically consistent | More conservative lighting; fewer dramatic shadows |
| Skin tones | Idealized, often too smooth; pores and fine lines absent | More varied, but blemishes and asymmetry are still regularized |
| Background coherence | Backgrounds often contain dreamlike or inconsistent elements | Backgrounds follow prompt instructions more literally but may repeat patterns |
| Text handling | Poor — letters blend, overlap, or invent characters | Significantly improved in v3; short text often renders correctly |
| Hands and fingers | Historically poor; improved but extra digits still appear | Fewer errors, but knuckle geometry remains inconsistent |
| Watermark / branding | No visible watermark in upscaled outputs | No visible watermark; but C2PA metadata embedded |
How They Look Under ELA (Error Level Analysis)
Error Level Analysis recompresses an image at a known quality level and maps the difference. Areas that have been modified or generated differently from surrounding regions show elevated error levels.
Midjourney under ELA:
- ELA heatmaps tend to be relatively uniform, reflecting the model’s consistent noise profile across the entire canvas.
- Regions of high detail (hair, fabric, foliage) show slightly elevated ELA signals, but without the sharp boundaries you would expect from a composited or locally edited photograph.
- The background and foreground typically show similar ELA levels, which is unusual for a real photograph where depth-of-field creates natural variation.
DALL-E 3 under ELA:
- DALL-E 3 outputs often show subtly elevated ELA around object boundaries — a byproduct of its edge-aware rendering pass.
- Text elements (when present) frequently show distinctly different ELA levels from surrounding areas, making them easy to isolate.
- The overall ELA signature is slightly “noisier” than Midjourney’s, with more variation across different semantic regions.
In both cases, the absence of natural JPEG compression artifacts from a camera sensor is a key indicator — real photographs accumulate compression history differently than freshly generated images.
FFT Frequency Analysis
Fast Fourier Transform analysis converts an image into its frequency components. Cameras introduce specific noise patterns; generative models introduce different ones.
Midjourney FFT signatures:
- Midjourney outputs show a relatively smooth frequency distribution with a bias toward mid-frequency content (consistent with its aesthetic smoothing).
- The high-frequency spectrum (fine grain, sensor noise) is notably absent or artificially regular.
- Periodic artifacts are rare but can appear in backgrounds with repeating synthetic textures.
DALL-E 3 FFT signatures:
- DALL-E 3 images show a slightly wider frequency distribution, reflecting its more realistic texture rendering.
- However, the absence of camera-specific noise still makes the high-frequency spectrum anomalously clean compared to a real photograph.
- When DALL-E 3 renders text, the FFT often shows grid-like artifacts in the frequency domain corresponding to the character rasterization process.
EXIF and Metadata
This is where the two models diverge sharply in a forensically useful way.
Midjourney:
- Outputs contain minimal or no EXIF data.
- There is no embedded camera make, model, GPS, or timestamp.
- The software field, if present, may reflect the image editor used to save the final file, not Midjourney itself.
- Upscaled images from Midjourney often show generic PNG metadata with no provenance information.
DALL-E 3:
- DALL-E 3 images generated via the API or ChatGPT may embed XMP metadata indicating OpenAI as the creator.
- More importantly, DALL-E 3 supports C2PA (Content Credentials), which embeds a cryptographically signed provenance chain directly into the file.
- This means a properly preserved DALL-E 3 image can be verified as AI-generated through any C2PA-compatible tool, including FakeRadar’s analysis pipeline.
C2PA Support: A Critical Difference
C2PA is the emerging standard for content provenance. A C2PA manifest binds the image to its creation history using cryptographic signatures.
- DALL-E 3: Supports C2PA. Outputs carry a signed manifest identifying OpenAI as the producer.
- Midjourney: Does not currently implement C2PA. There is no embedded provenance chain.
This has a practical implication: a DALL-E 3 image that retains its original metadata can be definitively identified as AI-generated without any ML classifier. A Midjourney image offers no such self-identifying signal — forensic analysis must rely entirely on visual artifact detection, ELA, and FFT.
Which Model Is Harder to Detect?
Midjourney is generally harder to detect, for several reasons:
- Its aesthetic training produces images that look intentionally stylized, which can be mistaken for professional photography or illustration.
- It leaves no self-identifying metadata. There is no C2PA manifest, no software tag, no provenance signal.
- Its smooth, consistent noise profile makes ELA analysis less discriminating.
- High-resolution upscales can fool reverse image search because they are unique generations.
DALL-E 3 is somewhat easier to catch precisely because it wants to be transparent — the C2PA embedding is a feature, not a liability. But a DALL-E 3 image that has been screenshot, re-uploaded, or stripped of metadata loses this advantage and becomes nearly as difficult to detect as Midjourney output.
What FakeRadar Checks
FakeRadar’s analysis pipeline applies multiple layers to catch both models:
- Hive AI classifier — trained on broad AI-generated image datasets including both Midjourney and DALL-E outputs
- ELA heatmap — highlights generation-consistent error patterns
- FFT spectrum — flags anomalous frequency distributions
- C2PA verification — reads and validates content credential manifests
- EXIF analysis — flags missing or inconsistent metadata
No single signal is sufficient. The combination of all five is what makes model-agnostic detection reliable.
Ready to test an image? Upload it to FakeRadar’s analysis tool and get a full forensic report — including ELA heatmap, FFT spectrum, C2PA status, and EXIF breakdown — in seconds.