Hi Yoann,
It sounds to me like you're not padding your incoming audio stream with silence when you miss a packet. If you're just feeding the audio queue each audio packet as it enters the system, then when you lose a packet in transit (say, of 5 ms), then unless you insert 5ms of silence where the packet should have been, your audio stream is going to advance by 5ms, growing out of sync.
You also need to be aware of issues with discontinuity if you're going for decent-sounding audio in the presence of packet loss. If you merely insert silence in place of lost packets, you'll get an audible click at the start and end of that region. There are various techniques to mitigate this problem (the area is called 'error concealment').
Speaking generally, where you have the option to do so it's best to use an existing, tried-and-true implementation, rather than writing your own from scratch, especially if you're not familiar with audio programming. (Says the guy who just finished writing his own implementation. D'oh.)