Exact number of audio samples in WMA (lossless)

Discussion:

(too old to reply)

Holger

2010-04-12 13:17:01 UTC

Hi,

I'm using a WMSyncReader to decode WMA files. Is there an easy way to find
out the exact number of audio samples (audio sample frames) decoded in the
file without actually decoding the complete file from beginning to end and
summing up the sample counts from the buffers returned from the
IWMSyncReader::GetNextSample calls? I'd really like to avoid decoding the
complete file only to find out the exact sample count. Any hints are welcome!

Holger

somebody

2010-04-22 08:27:55 UTC

Permalink

Post by Holger
I'm using a WMSyncReader to decode WMA files. Is there an easy way to find
out the exact number of audio samples (audio sample frames) decoded in the
file without actually decoding the complete file from beginning to end and
summing up the sample counts from the buffers returned from the
IWMSyncReader::GetNextSample calls? I'd really like to avoid decoding the
complete file only to find out the exact sample count. Any hints are welcome!

Holger, you can retrieve an audio stream duration for example and convert
that to samples, but I think you will not get the correct number of samples.
And here is why I think so:
After decompressing audio compressed with lossy compressor, you will get
more audio samples than you provided while you were compressing. This is
very strange (even stupid) if you ask me, but, that's the way Windows Media
works. However, since you use lossless compression, you might be lucky.
Perhaps in that case the retrieved duration will be correct.

Alessandro Angeli

2010-04-22 18:07:35 UTC

Permalink

From: "somebody"

Post by somebody
After decompressing audio compressed with lossy compressor, you will
get more audio samples than you provided while you were compressing.
This is very strange (even stupid) if you ask me, but, that's the way
Windows Media works.

That's not how WM works, that's how every efficient audio compression
scheme works. To achieve high compression ratios at high quality, WMA or
MP3 or AAC or AC-3... do not compress the stream on a sample by sample
basis but in blocks (frames). The last frame in the stream is padded to
make up a whole stream so that it can be compressed. Unless there is a
field in the frame header to state how much padding was added, it is
impossible to recover the original sample count. Since this precision is
an uncommon requirement and saving bits instead is, there is no such
field (at least in MP1/MP2/MP3, I haven't checked whether they added it
in AAC or AC-3, but I doubt it -- WMA seems to not have it, too).

--
// Alessandro Angeli
// MVP :: DirectShow / MediaFoundation
// mvpnews at riseoftheants dot com
// http://www.riseoftheants.com/mmx/faq.htm

Holger

2010-04-23 12:41:02 UTC

Permalink

Post by somebody
Holger, you can retrieve an audio stream duration for example and convert
that to samples, but I think you will not get the correct number of samples.

Yes, the duration in milliseconds, so it doesn't give the exakt number of
samples.

Post by somebody
That's not how WM works, that's how every efficient audio compression
scheme works. To achieve high compression ratios at high quality, WMA or
MP3 or AAC or AC-3... do not compress the stream on a sample by sample
basis but in blocks (frames). The last frame in the stream is padded to
make up a whole stream so that it can be compressed. Unless there is a
field in the frame header to state how much padding was added, it is
impossible to recover the original sample count. Since this precision is
an uncommon requirement and saving bits instead is, there is no such
field (at least in MP1/MP2/MP3, I haven't checked whether they added it
in AAC or AC-3, but I doubt it -- WMA seems to not have it, too).

Well, I'm sure that WMA Lossless does have this field. I agree that getting
the exact number of frames is usually not a requirement for lossy compression
though. However, you usually use lossless compression to get exactly the
sample samples that you had before encoding something. This should not only
apply to the values of the samples, but also to the number of samples. And in
fact, for WMA lossless this is true. To prove that, I have encoded a file
with 836724 PCM frames (stereo samples) using WMA Lossless. Then I used
IWMSyncReader to decode the file and I got exactly 836724 PCM frames out of
the decoder. To check if the sample count was just the same by accident, I
cut out one sample frame from my original file, encoded und decoded again and
got exactly 836723 output frames this time. Therefore WMA Lossless does
preserve the exact number of audio frames and it must be stored somewhere in
the file. So there should be some way to get this number without actually
having to decode the whole file...

Alessandro Angeli

2010-04-23 17:32:38 UTC

Permalink

From: "Holger"

[...]

Therefore WMA Lossless does preserve the exact number of audio frames
and it must be stored somewhere in the file. So there should be some
way to get this number without actually having to decode the whole
file...

Sorry, I missed the fact that you were asking about WMAL.

A WMA file is an ASF file container that contains WMA*-encoded audio
data.

The WM[Sync]Reader parser only understands the ASF file syntax (which is
even published), while the WMA* data bitstream syntax is only understood
by the WMA* decoder (and is secret).

I am quite sure the number of valid samples per frame is stated
somewhere in the WMAL bitstream, after all only the decoder needs this
information. Which means that you can not extract it from the ASF file,
whether by yourself or through the WMF parsers, since it is not there.
If you knew how to read the WMAL bitstream, you could extract it from
there yourself but, since you don't, you can only ask the decoder and
the decoder will only tell you if you let it decode the stream.

On the other hand, you can take a look at all the info in the ASF file
using the free ASFView tool from Microsoft and maybe there is some kind
of info in the file header that can help you (and, if there is, you can
probably get the parser to tell you or, in the worst case scenario, you
can extract it yourself). By I never noticed anything of the kind.

--
// Alessandro Angeli
// MVP :: DirectShow / MediaFoundation
// mvpnews at riseoftheants dot com
// http://www.riseoftheants.com/mmx/faq.htm

somebody

2010-04-26 10:36:31 UTC

Permalink

Or, if he has custom capturing software that creates WMA/WMV files, he could
add custom attribute (number of samples) into the header.
If not, perhaps some kind of WMA/WMV indexing with the proper flags might
lead to getting exace audio duration. For example, if you want to seek by
frame numbers, you have index the WMV file. (It is slow on huge video files,
and sometimes never ends).

Holger

2010-04-29 12:32:01 UTC

Permalink

Post by Alessandro Angeli
I am quite sure the number of valid samples per frame is stated
somewhere in the WMAL bitstream, after all only the decoder needs this
information. Which means that you can not extract it from the ASF file,
whether by yourself or through the WMF parsers, since it is not there.
If you knew how to read the WMAL bitstream, you could extract it from
there yourself but, since you don't, you can only ask the decoder and
the decoder will only tell you if you let it decode the stream.
On the other hand, you can take a look at all the info in the ASF file
using the free ASFView tool from Microsoft and maybe there is some kind
of info in the file header that can help you (and, if there is, you can
probably get the parser to tell you or, in the worst case scenario, you
can extract it yourself). By I never noticed anything of the kind.

Thanks for your comments. After having done some more investigations now, I
can completely confirm that. I've created some WMA Lossless test files, one
with only one stereo sample, another one with two stereo samples, and another
one with three of them. In the ASF file headers, everything is identical,
except the bitrate values and the file id. What is different of course is the
payload of the (only one) data object in the file. I've even found a value
which might indicate the sample count in the data object, but since it's not
obvious and even seems to be at different offsets in the packet, I will not
take any efforts to analyse this any further. So it seems that the exact
sample count is indeed encoded in the WMA Lossless file, but there is no
documented way to get it without actually decoding the whole file.

Holger

somebody

2010-05-17 08:56:13 UTC

Permalink

Holger, could you provide a sample file 5-10 seconds long? I would like to
try something with it.

h***@40th.com

2010-04-23 13:05:01 UTC

Permalink

Since this precision is an uncommon requirement and saving bits instead
is, there is no such field (at least in MP1/MP2/MP3, I haven't checked
whether they added it in AAC or AC-3, but I doubt it -- WMA seems to not

MP3 gets it from the xing-type header, both lead and
tail padding, and fhg does too (fhg lead padding only,
but you can work that out to get the tail). iTunes has
its own crafted gapless header data for MP3 and AAC (as
of ver. 7). AC3 as a file, no, but you could do your
own, I suppose (but who listens to AC3 as a file?).
AC3 as a stream would take on whatever the source is.
One can always go out of the way to mess that up.
Ogg/V (it's lossy) is gapless without any extra info.
No one has tried to get gapless from WMA (except WMA
lossless). Lossless is of course gapless (no extra
data, front or back).

As to this "saving bits" thing as a reason for not
having gapless info, that's ... funny. It only takes
a few dozen bytes of info. What's a a few dozen bytes
in a 10 MB file so you can completely avoid gaps between
album tracks? As for a practical example of this, you
need only peek into the .sig below. Perfectly gapless
for nearly a decade.... but you need the source (.mp3, .m4a,
etc.) to include the gapless info (or be gapless from the
start) to get that, and you need the software to take
advantage of it. You don't just get gapless. You have
to do gapless.

--
40th Floor - Software @ http://40th.com/
PhantasmX3 - The finest sound in the world
CastleKeeper - network camera surveillance/recorder, NVR

Alessandro Angeli

2010-04-23 17:22:49 UTC

Permalink

This post might be inappropriate. Click to display it.

h***@40th.com

2010-04-23 21:51:18 UTC

Permalink

Post by Alessandro Angeli
Yes, but none of this is standard. My point was that this info could
have been included in the format specs, but none of the formats seem to

Well, it's here today, and was here nearly 10
years ago. Gapless info was in lame 3.90, an
mp3 encoder, which came out sometime around
the turn of the century (the 3.90 version, that
is). You may as well say id3 is not a standard
(never mind that it's a mess). The gapless info
is, in a word, trivial. It's the making-use-of
that's not, and why so few players can do REAL
gapless (real gapless = same as CD gapless).

Would you settle for a CD player than had gaps
in playing back tracks? I would hope not, or
I am REALLY wasting my time writing this. haha

Post by Alessandro Angeli
You should read the appendices in the MPEG-1 standards that explain some
of their weirdest choices, mostly giving small (by today's standards)
savings as the reason.

Gapless info is a one-time cost item. A few
dozen bytes; the actual data needed is only two
words: the size of the lead-in (delay) and the
size of the lead-out (padding). The rest is
only used to mark these words. Anyway, it's
trivial to declare the encoder delay and padding.
LAME did it 10 years ago. iTunes did it, oh, 4
or 5 years ago.

Post by Alessandro Angeli
mattered. They were also developed as headerless streams so there is no
way to add this information as metadata once. You would need to at least

FhG did it (halfway, but good enough) about 10
years ago, or maybe even longer since it was
in FhG 1.0 -- if you know mp3 then you already
know FhG -- the daddy of mp3. If it can put
the delay spec in an mp3 file (FhG had its own
header data, sort of like a xing header), that's
a pretty good lead to follow.

Anyway, I don't understand your reluctance to
move on. You may as well say .docx files should
not be used since .txt files don't have any markup
to them. haha

Post by Alessandro Angeli
add 1 bit per frame to signal whether the extra field is present and

yada-yada-yada... Like I said, a one-time cost
fo maybe a dozen or two bytes. And if FhG could
do it without crying about it, ... well, haha,
get with times already!

Post by Alessandro Angeli
The reason for that is because the information needed is not part of the
standard so, unless you add it through some hack from the beginning, you
have no way of recovering it afterwards.

So? LAME puts it in. And has by default for
nearly 10 years. I don't think you can make it
not put it in. Many an encoder is based on LAME.
iTunes finally got on board in v7, for both mp3
and m4a. WMA is the only format I know of that
has no gapless info, so, except for WMA-LL, it
always plays with gaps ("close to gapless" is
not gapless; some can't/don't want to notice the
difference).

--
40th Floor - Software @ http://40th.com/
PhantasmX3 - The finest sound in the world
CastleKeeper - network camera surveillance/recorder, NVR

Alessandro Angeli

2010-04-23 23:00:34 UTC

Permalink

This post might be inappropriate. Click to display it.

h***@40th.com

2010-04-24 10:20:11 UTC

Permalink

Post by Alessandro Angeli

Post by h***@40th.com
You may as well say id3 is not a standard

A specification doesn't need to be published by ISO to be a standard, it

That was rhetorical. You're the one making
"standard" a big deal. None of this matters.

Gapless info is trivial. Gapless playback is
essential. If you like gaps, well, hey, there
you go. I don't, that's why I've been gapless
for going on 10 years... 8 or 9 anyway. There's
not much more that needs to be said. (Promises)

Post by Alessandro Angeli
all lossy formats I could think of have the same flaw, and for (good?)
reason.

Flawed reasoning is what you have there. It
makes for a poor argument in any case, and
it's almost like talking to a brick wall --
something dense for sure. haha

Post by Alessandro Angeli
Just as some people have decided to hack MP3 to produce MP3+tags and
correct MP3's flaws in terms of metadata (or lack thereof) and have done

Not a big deal. Why you continue on (and on)
I dunno. Is it you like listening to yourself
type? I don't use mp3 -- haven't since 2004.

(re: .txt vs .docx)

Post by Alessandro Angeli
The comparison is not really pertinent, since DOCX is not an evolution
of TXT. MP3+tags relates to MP3 like MSHTML relates to HTML.

LOL

Post by Alessandro Angeli

Post by h***@40th.com
yada-yada-yada... Like I said, a one-time cost
fo maybe a dozen or two bytes. And if FhG could
do it without crying about it, ... well, haha,
get with times already!

That doesn't even deserve a real reply.

Yeah, but that hasn't stopped you so far. FhG designed
mp3. It designed the header it used in its encoder (still
does) to say exactly how long the delay (lead-in) is.
From that, one can, with a little bit of know-how, derive
the lead-out (padding). FhG did this from its first
encoder, waaaaaay baaaack, whenever that was. 1997 or so.
It was optional. The header, too. So are seatbelts, but
who doesn't use them? Riiiight.

Post by Alessandro Angeli
Who's stopping anybody using the WMWriter to produce WMAs to add the PCM
sample count as an attribute in the header? It requires 1 call to the
WMWriter. It's even easier than what the LAME guys had to do.

Sample count and gapless are not related. I know
this topic was somehow related to sample count,
though. Not a thrilling topic (and yet, here I am).
One needs in-depth knowledge of the encoder to know
the delay. Anyone doing this after the fact will
never be sure to get it right, not even if the encoder
has the same delay for everyting.

Hey, here's a deal you can't refuse. You drop it and
... well, I'll drop it no matter what you do.

--
40th Floor - Software @ http://40th.com/
PhantasmX3 - The finest sound in the world
CastleKeeper - network camera surveillance/recorder, NVR

somebody

2010-04-23 13:18:45 UTC

Permalink

Post by Alessandro Angeli
From: "somebody"

Yes, I know that, but that same number of padding samples could be
remembered, right?
We need it only for the last buffer.

Post by Alessandro Angeli
field in the frame header to state how much padding was added, it is
impossible to recover the original sample count. Since this precision is
an uncommon requirement and saving bits instead is, there is no such field
(at least in MP1/MP2/MP3, I haven't checked whether they added it in AAC
or AC-3, but I doubt it -- WMA seems to not have it, too).

You can't tell me that WM is not able to remember provided sample count for
the stream once the compression is done. It contains tons of unusable stuff
inside of headers anyway. 4 or 8 bytes more wouldn't be a problem. That same
number could be returned when user requests the duration. Also, while
decompressing, the decompressor would know how much of PCM it has to
deliver. So, it could do some basic DSP processing if it has too much PCM at
the end, or not enough.That way you would get the same number of samples
after the decompression. Seamless file switching would be easier to
implement etc....
Without it, you are still doomed to raw PCM if you have such requirements.

Alessandro Angeli

2010-04-23 17:31:51 UTC

Permalink

From: "somebody"

Post by somebody
Yes, I know that, but that same number of padding samples could be
remembered, right?
We need it only for the last buffer.

[...]

Post by somebody
You can't tell me that WM is not able to remember provided sample
count for the stream once the compression is done. It contains tons
of unusable stuff inside of headers anyway. 4 or 8 bytes more
wouldn't be a problem. That same number could be returned when user
requests the duration. Also, while decompressing, the decompressor
would know how much of PCM it has to deliver. So, it could do some
basic DSP processing if it has too much PCM at the end, or not
enough.That way you would get the same number of samples after the
decompression. Seamless file switching would be easier to implement
etc....

I didn't write it is impossible to do, I wrote that it has not been
done, for any of the lossy formats I have checked.

And don't mistake a WMA file with a WMA stream (see my reply to Holger).

If you put the info once in the WMA header, the application can know
about it, but the decoder won't. If you put it in the data bitstream for
the decoder, you need to add it for every frame, since the decoder can
not know what frame is the last frame until it's too late (see my reply
to hel@@@th.com).

A lot of things could have been done, the point is whether they have
been (and, to a lesser degree, why they haven't).

Post by somebody
Without it, you are still doomed to raw PCM if you have such
requirements.

Or a lossless format.

--
// Alessandro Angeli
// MVP :: DirectShow / MediaFoundation
// mvpnews at riseoftheants dot com
// http://www.riseoftheants.com/mmx/faq.htm