ChatGPT Can Now Think With Images. Here’s Why That Matters For Your Investigations.
- Noah Britt

- May 14
- 5 min read

When I first started combining computer science with OSINT work, everything was text-based. If I wanted to analyze a bunch of user posts, I’d use natural language processing techniques like sentiment analysis and named entity recognition. If I needed information from an image, I’d check the EXIF metadata - details like location, device, or timestamp - all still just text.
When the first relatively strong GPT models started showing up not long ago, I stayed in that same text-first mindset. Sure, ChatGPT could describe an image, and that was useful, but it didn’t offer the same depth you could get from analyzing text. If I uploaded an image someone posted on Facebook, ChatGPT might say it showed a traffic jam, but the caption already told me that: “I’ve been stuck in traffic for ten hours.” At the time, I saw models like Sora and Google’s Veo as cool, but mostly just flashy toys. What I hadn’t considered was how much deeper image understanding could go, beyond simple descriptions.
Let’s shift gears for a second. Ever heard of the game GeoGuessr? It shows you a random Google Street View image, and your job is to guess where it is on a map. Back in my computer science classes, it was my go-to whenever I got bored. If you’d told me then that this game would eventually be useful in my job, I would’ve laughed.
Sorry for the tangent, but it’s relevant. Reasoning models have been around for a few months now, and they’re different from earlier large language models (LLMs) because they “think” before responding. OpenAI’s new o3 and o4-mini models can now “think” with images. What does that mean? Imagine pulling up a GeoGuessr image and trying to figure out where it was taken. You’d study the visual details - maybe zoom in on a license plate, or, if you were really good, analyze the tree types or mountain formations. Now, imagine a model trained on the entire Internet doing that. It’s that good.

Naturally, I had to see what it could do.
I was scrolling through Facebook and saw a video of a bison sitting on a road.
Not much to go on. There are lots of bison, and lots of roads. So I asked ChatGPT’s o3 model to take a shot at determining the location of the photo. Just to be clear, I used a screenshot with no location metadata. Here’s what it came up with:
Best‑guess location
North Entrance ↔ Northeast Entrance road (US‑212) in Yellowstone National Park – the short canyon stretch a few miles west of Lamar Valley, roughly 44.903 ° N, 110.160 ° W. Why this spot fits
By piecing together those visual cues, the most likely location was a tight S-bend in Lamar Canyon, about 2 km west of the western meadows in Lamar Valley.
Here’s the wild part: the result was correct. Now, unless you're an environmental scientist (or a serious bison enthusiast), that specific detail might not seem all that useful.

But imagine you're investigating someone with very little available information. Think back to that traffic jam example - maybe someone posted a picture saying they were stuck in traffic, but you have no idea where they are located. I searched around Facebook for a real-world post that fit this situation, and found the image to the right. (In this article, the license plate is blocked, but for context, the original was just as blurry.)
Once again, there isn't much to work with. But when you're trying to figure out someone's location, the smallest clues can be invaluable. In this case, all I had was a simple screenshot - no metadata at all. Here’s what it came up with:
Best-guess location
The scene almost perfectly matches a four-lane stretch of Washington State Route 18 on the west side of Tiger Mountain - near milepost 2425, at approximately 47.48 N / 121.98 W - heading west toward Issaquah.
Why this spot fits
Putting all the clues together, the only location that checks every box is the westbound carriageway of SR 18, just west of the Tiger Mountain summit - about 2 miles west of the I-90 interchange and 4 miles east of Issaquah Hobart Road.
If you drop a pin at 47° 28′ 45″ N, 121° 58′ 48″ W, the downslope view into the small valley matches the photo: the same stand of conifers, the same clearing on the inside of the curve, and the same shoulder sign hierarchy used by WSDOT.
If even one detail were off, like the number of lanes, the presence of a cable barrier, or an Oregon plate, the answer would have changed. But every indicator matched that Tiger Mountain stretch.
Not only was the answer correct, but the logic and reasoning behind it were impressive. As someone who’s spent a fair amount of time playing GeoGuessr, I probably would have given up on this one. ChatGPT didn’t, and if this were a real investigation, think of how many new leads this kind of analysis could provide.
More importantly, if you're integrating AI tools into your workflow like we are, all of this can be done through the OpenAI API. That means this kind of image analysis could be applied to dozens of posts automatically. We’re still in the early stages, but anyone doing OSINT work should consider adding this to their toolkit.
How are you exploring new tools like ChatGPT to enhance your investigative work? We’d love to hear how you're using—or planning to use—AI to analyze digital evidence in the comments below.
For more insights on integrating AI into your investigative strategy, follow Social Slooth and Noah Britt on LinkedIn or reach out to our team directly.



Comments