Because the model has a huge number of polygons and we have to play at 60 frames per second. Here, let me visually demonstrate what I mean.

Here’s Ryu hitting Zangief with a Shoryuken:

Now… let’s say that we’ve got just one big hit box around Zangief and a hurt box around Ryu’s fist here. The hurt box around Ryu’s fist intersects Zangief’s hit box, ergo Zangief gets hit.

How many collision checks do you need to make here to see if Zangief gets hit? Just one - does Ryu’s hurt box overlap with Zangief’s one hit box? If yes, Zangief gets hit.

But let’s say that we decide we want a more accurate representation of Zangief’s body. Like… say, instead of just one big box, let’s put a bunch of smaller boxes on Gief’s body. Like so:

The more small boxes we add, the closer an approximation it becomes to Gief’s actual body. Right? Now here’s a question - how many collision checks do I need to make in order to tell if Ryu hit Gief this frame?

If you said “One for every green box”, you get a cookie. Now… how many green boxes are there in this new representation of Gief? I count 11 here - that’s ten more than before. On any given attacking frame, in the worst-case scenario, I will need to check whether any of the 11 boxes have been hit by any of the hit boxes of the attacking move.

Did you see what I added there? I said “* Any* of the hit boxes of the attacking move” - there might be more than one. For example, look at a move like this:

Yun is attacking with his leading (left) palm, but he also extends his right palm at the same time. Shouldn’t that also have a hurt box? If so, that’s another set of checks against all of the hit boxes that need to happen this frame.

## Total number of collision checks I need to make = (# of hit boxes) x (# of hurt boxes)

Are you starting to see why we don’t use the full model for hit detection? There are often thousands or even tens of thousands of polygons that go into a fully rendered model. When we take a swing, we would have to check the polygons we mark as damaging against every single one of those that can be hit in order to be sure whether we’ve hit it. With thousands of polygons potentially colliding, that can easily mean millions of collision checks * every frame*. We gain some accuracy, but it slows gameplay to a crawl and the added gain in accuracy isn’t even particularly fun.
Overwatch, as an example, is extremely lenient with their collision detection, and not particularly worse for wear because of it:

Overall, that tradeoff really isn’t worth the performance cost. We always have to prioritize performance and frame rate over missing by inches, and that’s why we use hit boxes that only generally approximate the shapes of our characters. It’s much faster to handle these calculations this way.

PS. If you’re interested in the actual math behind collision detection (which is a serious issue in practically all video games), I highly recommend the book [Real Time Collision Detection] by Christer Ericson. Fair warning - it’s * VERY* math-heavy, especially with regards to 3D math. If you haven’t enjoyed college-level linear algebra at the very minimum, it may just seem like gibberish to you. It’s also a technical book and priced to match. You have been warned.

Got a burning question you want answered?

- Short questions: Ask a Game Dev on Twitter
- Long questions: Ask a Game Dev on Tumblr
- Frequent questions: The FAQ