Play slower shots and use more efficient mathematics to approximate distances.
I think for slower speeds you can get a close to exact speed.
For faster speeds you use AI. Basically you want to look at cueball distortion (The cueball should look more ovalish the faster it is going).
To do this you want to setup a high speed camera and a low speed camera. Shoot a bunch of shots at different speeds and record them all.
Now with the high speed camera calculate their actual speed. Take the frames from the low speed camera and put them into a CNN (convolutional neural network) after fitering out the non-CBs as they are noise. Have the outputs of the CNN relate to discrete speeds.
Now train the CNN using the actual values from the high speed camera with their related distorted pictures. You could even have the speed you calculate with distance over time from the slow camera be an input to the CNN (discretized of course, this way might not best the best way, however), or in the alternative have multiple frames be inputs to the CNN.
To make this work properly you probably will have to create training data with a variety of low speed cameras and framerates/resolutions.
To me this seems like the best way to do it.
In the alternative you could measure the length of one side of the distorted oval vs the other side of the oval, and place the distorted size measurements into a regression model classifier and have it predict the actual speed. Again you will need a variety of training data. This will attain a 'non-discrete' "exact" value, unlike their former method which will result in discretized finite state predictions.
In general any digital camera has a weak microcontroller (MCU),an attached USB interface chip, a single A/D and a decoder to address the individual photo cells (pixels) (ie: attach them serially to the A/D via a scan algorithm). How fast the scanning of cells via the decoder takes place depends upon the particular camera and the resolution mode it is in.
There is also no guarantee that the scanning takes place in top to bottom row by row linear fashion. Hence the need for a lot of training data to deal with these variables. I would assume as well that there is no guarantee the time between frames is guaranteed to be constant. (The fact that the MCU will have to handle interrupts from the USB chip upon receiving USB frames will alone cause this to not be guaranteed constant) This could add noise as well.
Maybe with a cell phone camera the video results are placed directly in memory via a shared bus, but even this will have potential delays. It really depends upon the design of the architecture. Either way, the data transfer of the video frames from the camera's MCU to the memory of the main CPU will cause the advertised period between frames to vary.