Here is a sketch of a partial answer. Let $k=l=N$, $n_0=0$ from now on to simplify notation. I will give an algorithm producing an approximate answer that, under some broad assumptions, can be made exact in time about $O(N^{3/2} (\log N))$. That is far from time $O(N \log N)$ or $O(N^{1+\epsilon})$ (what I desire), but it is substantially better than the trivial bound $O(N^2)$.
Again for simplicity, I will assume $a_i$, $b_i$ are rationals with denominator $N$, though that is not essential in what follows. Actually, let me renormalize things so that $a_i$, $b_i$ are integers, and we are trying to determine $(F\ast g)(n N)$ for $n=0,1,\dotsc, N-1$. (None of these assumptions are really essential, but they do simplify notation.) If you wish, you may assume that $N$ is a power of $2$. We will work with Fourier transforms defined over $\mathbb{Z}/N^2 \mathbb{Z}$.
This problem reduces to that of computing $$h(n) = \sum_{j=0}^{N-1} (f\ast g)(n N +j)$$ for $n=0,1,\dotsc, N-1$, where $f$ is a linear combination of the delta functions $\delta_{a_i}$. We can of course write $h$ as a convolution of $f$, $g$ and the characteristic function $J$ of $\{0,1,\dotsc,N-1\}$.
The inverse transform of that double convolution is $\widehat{f} \cdot \widehat{g} \cdot \widehat{J}$. Now the interesting thing is that $\widehat{J}$ is concentrated in the interval $\lbrack -N,N \rbrack$. It is basically a Dirichlet kernel normalized at scale $N$. Of course one of the issues with the Dirichlet kernel is that it doesn't decay that rapidly; if only it basically vanished outside $I_C = \lbrack - C N,C N\rbrack$, we would be done.
It does seem to me that one can compute the restriction of $\widehat{f}$ and $\widehat{g}$ to $I_C$ in time $O(C N \log(N^2)) = O(N \log N)$ using a modified Cooley-Tukey algorithm. Now, if we simply replace $\widehat{J}$ by its restriction to $I_C$, we get an algorithm that runs in time $O(N \log N)$ -- but of course it would compute $\widehat{f}\cdot \widehat{g} \cdot \widehat{J}_{|I_C}$ rather than $\widehat{f}\cdot \widehat{g} \cdot \widehat{J}$.
What we can do is replace $J$ by an approximation $J_M$, equal to $J$ outside $\{0,\dotsc,M-1\} \cup \{N-M,\dotsc,N-1\}$, say. We can take $J_M$ to be a discrete approximation to the integral of a function of the form $$\jmath_M(x) = \int_0^x \left(- \frac{N}{M} \phi\left(\frac{N}{M} t\right) + \frac{N}{M} \phi\left(\frac{N}{M} (1-t)\right)\right) dt$$ for $\phi$ a bump function with support on $\lbrack 0,1\rbrack$ and very fast decay (say, $\widehat{\phi}(t) = O(e^{-\sqrt{t}})$, as is the case for a standard choice of $\phi$. Then $\widehat{J_M}$ will be all but supported on $I_{C N^2 L^2/M} = \lbrack -C N^2 L^2/M, C N^2 L^2/M\rbrack$, where $L = \log N$, and we can truncate it to such an interval with a total error of size $o(1/N)$, or even $O(1/N^{10})$. (Either should be enough; for instance, if we expect an integer as an answer at the very end, an error $<1/2$ can be eliminated by truncation. An error $<1/2$ has size $<1/2N$ relative to a result that is $\leq N$, obviously.) We can compute $\widehat{f}\cdot \widehat{g} \cdot \widehat{J_M}_{|I_{C N^2 L^2/M}} \sim \widehat{f}\cdot \widehat{g} \cdot \widehat{J_M}$ in time $O(C M L^2 N)$. Then, applying a modified Cooley-Tukey algorithm again, we obtain rather than $f\ast g\ast J_M$.
Now, the difference between $f\ast g\ast J_M$ and $f\ast g\ast J$, taken over all points $n N$, will consist of $O(N M)$ errors, assuming some mild equidistribution conditions (which will hold for typical inputs $a_i$, $b_i$). We can eliminate them by inspection in time $O(N M)$. We set $M = L \sqrt{C N}$ and are done.
I will be the first to say that I am unsatisfied with the above: (a) it takes too much time by a factor of $\sqrt{N}$; (b) it works with continuous approximations when we should be able to proceed discretely.
Still, I hope the picture is now clearer. We are allowed to sample $O(N)$ values of $\widehat{f}$ and $\widehat{g}$, and then use them to construct a function of the form $\widehat{f} \cdot \widehat{g} \cdot \eta$, with support of size $O(N)$, whose transform we can then take at $O(N)$ places.
(It would seem that we do have a question that seems somewhat related to sparse Fourier transforms, compressed sensing and the like, then, at least in a very rough sort of sense.)
Here is an obvious question. We have sampled $\widehat{f}$ and $\widehat{g}$ at all integer points in an interval $I_{C N^2 L^2/M}$. Can we sample them at a better choice of points, so that only $O(N)$ or $O(N \log N)$ points will be needed? The choice of points may depend on $f$ and $g$. (A friend points out that choosing the points to be in an arithmetic progression sounds like the worst choice possible; from that perspective, it's almost surprising that taking the points to be those in the interval $\lbrack -CN,CN\rbrack$ comes close to working.)