In case it wasn’t clear, the third example cannot be aligned to the times you want, because the animation can only be changed on each new game frame (not to be confused with a animation frame).
Each game frame is either ~17 ms apart or ~33ms apart, but they WILL vary.
The ONLY true way to get a consistent timed animation like you want it to actually make the animation artwork play that way and have a sequence be composed of multiple-frames.
In short, it ain’t easy to get animations that perfectly align with speech. However, you may find your approach is just ‘good enough’.