Hi all,
I got a question form a student on how to answer the questions about the analysis windows in Lab1. I answer here so that everybody can see.
As we discussed during the lecture, feature frames are computed over short windows of speech samples. The window is then shifted in time to compute the next feature vector and the shift may be smaller than the window length (so that contiguous windows overlap in time).
The number of speech samples per second is given by the sampling frequency. The length of the window and its shift are given by the corresponding configuration parameters. You should be able to relate these three quantities to determine how many samples each window contains and how many milliseconds (or if you wish samples) the windows overlap.
To answer the question about how many feature vectors you have in a typical utterance, you should first decide how long is a typical utterance (you can have a look with wavesurfer at the ones you have created). Then you have to determine which parameter is relevant in this case among the ones mentioned above. You can safely disregard boundary effects at the beginning or end of the utterance because we are talking about a approximate number of feature vectors.
Hope this clarifies the questions