Tools like Otter.ai or VOMO.ai can process an MP4 file and provide a time-stamped transcript.
If you are looking for a or a summary of a specific file you have, you can use these tools to convert the video's audio into text: video-040.mp4
Could you tell me or what the first few seconds look like? Tools like Otter