Thanks Aran for the clarification. Now that I understood it is by design, I'd like to raise some scenarios that may need attention or advice:
Say the app is a game with its built-in music and sound effects, but the player wants to listen to his own music for just two minutes and get back to the game later without taking the game to the background. When he pauses the music and gets back to the game, now he needs UI help to restart the game's audio due to the missing "back trip" from the interruption. This seems to be a major distraction without a smart UI design.
About the design against possible confusion, I do know that if the user locked the phone or switched to something else, the problem would disappear because we have to use app lifecycle to control audio (e.g., silence). The problem only happens when the app is in the foreground, which indicates that the user wants to return to the app soon.
If the app was foreground and remained so by the time the music pauses, I wonder how much more confusing it could be for the app to resume its audio playback than to throw complete silence at the user when he refocuses onto the app that was making sounds a minute ago. He then has to realize to pause the entire game and switch on music/SFX in the option menu three taps away...and back.
I sense that the design treats app audio as lower priority compared to the music app, possibly from the iPod tradition. For audio-centric apps, this one-way interruption could be problematic.
Is having the phone calls and alarms as the only "exceptions" a sonic sanitation measure? I'm curious because if there are two exceptions, why not just open that hook and let apps decide whether to implement the "back trip" or not?
I know all these are subjective so I'm just bringing it up for discussion. Hopefully in the end I'll be convinced and happily take it for granted.
Thanks,
Beinan