The Kullback-Leibler divergence has a few nice properties, one of them being that [math]KL[q;p][/math] kind of abhors regions where [math]q(x)[/math] have non-null mass and [math]p(x)[/math] has null mass. This might look like a bug, but it’s actually a feature in certain situations.
If you’re trying to find approximations for a complex (intractable) distribution [math]p(x)[/math] by a (tractable) approximate distribution [math]q(x)[/math] you want to be absolutely sure that any [math]x[/math] that would be very improbable to be drawn from [math]p(x)[/math] would also be very improbable to be drawn from [math]q(x)[/math]. That KL have this property is easily shown: there’s a [math]q(x) log[q(x)/p(x)][/math] in the integrand. When [math]q(x)[/math] is small but [math]p(x)[/math] is not, that’s ok. But when [math]p(x) [/math]is small, this grows very rapidly if [math]q(x)[/math] isn’t also small. So, if you’re choosing [math]q(x)[/math] to minimize [math]KL[q; p][/math], it’s very improbable that [math]q(x)[/math] will assign a lot of mass on regions where [math]p(x)[/math] is near zero.
The Jensen-Shannon divergence don’t have this property. It is well behaved both when [math]p(x) [/math]and [math]q(x) [/math]are small. This means that it won’t penalize as much a distribution [math]q(x)[/math] from which you can sample values that are impossible in [math]p(x)[/math].