Fix songs skipping and looping in koel
I use koel to make my music library accessible on my phone and work computer. Since setting it up, I've had an issue where songs won't play to the end. On the mobile client, a song will play partway, then skip back to beginning and thus loop endlessly. In a desktop browser, the same problem shows up as the song stopping partway through, with playback skipping to the next song in the playlist.
I fixed this by increasing the send_timeout
on my nginx
servers hosting koel. In this article, I'll describe my configuration and debugging process, which may help you if you're facing a similar issue.
My setup
I've installed koel inside a jail on my FreeBSD server, served with nginx and fastcgi. A separate jail serves as the gateway to all the websites on this server. I figured that the problem could be in a) the koel client or the web browser b) nginx c) the koel server or its local environment.
Initial investigation
I opened up DevTools and started playing a song that previously would only play partially. I noticed that a different amount of data was loaded for this song before the skip occured each time.
Since many of my songs are encoded in ALAC and have to be transcoded, I wondered if the ffmpeg
was failing before the entire file was processed.
To test this possibility, I copied the request for the audio file as a curl command. When run in a local terminal, the full file downloaded each time without issues. This seemed to rule out the issue being with transcoding in ffmpeg
.
At this point, I suspected that the client was slowly buffering the audio file (different sizes loaded in client each time), but that the connection was being closed before the client was finished.
Timeout investigation
Next, I took a look at the nginx server logs. The access log in both of the jails showed that the full file was transferred (6762794 bytes), even though the browser only received it partially:
192.168.4.1 - - [18/Feb/2023:10:03:17 -0500] "GET /play/9134030d-7016-4771-oiw3n-93cnkq28hkl9?api_token=... HTTP/2.0" 200 6762794
This was confusing because it seemed likely that nginx was closing the connection. To try to isolate nginx, I started koel with php artisan serve
and accessed it directly. Running without nginx in the loop, songs still skipped! This suggested that the koel client could possibly be responsible, though nginx was still a possibility. Maybe the client was closing the connection early?
Is it the client?
To get more visibility into the request, I took a few packet captures with Wireshark.
The captures showed that:
- The requests were sent slowly, over a long time (116 seconds in above capture), in roughly 1 kB chunks.
- The requests suddenly stopped, with no apparent communication between server and client to prompt this.
I wondered if the transcoding step might be timing out in fastcgi/PHP when the file was fetched slowly. However, since the nginx access log always reported that the whole file was received from the koel server, this didn't make sense. Since the request seemed to be dropped suddenly, after nginx had received it from the koel server, it looked again like the problem was in nginx.
More logs from nginx
I increased the verbosity of the nginx error log by adding a log level to the config line:
error_log /var/log/nginx/error.log debug;
I loaded up the test song again, waited for it to skip, and watched the logs:
$ tail -n0 -f error.log
2023/02/18 12:26:53 [warn] 43962#103332: *137 an upstream response is buffered to a temporary file /var/tmp/nginx/fastcgi_temp/0/01/0000000010 while reading upstream, client: 192.168.4.50, server: localhost, request: "GET /play/9134030d-7016-4771-8af6-cd9c8e96052a?api_token=... HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.sock:", host: "192.168.4.218:2618", referrer: "http://192.168.4.218:2618/"
2023/02/18 12:27:56 [info] 43962#103332: *137 client timed out (60: Operation timed out) while sending to client, client: 192.168.4.50, server: localhost, request: "GET /play/9134030d-7016-4771-8af6-cd9c8e96052a?api_token=... HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.sock", host: "192.168.4.218:2618", referrer: "http://192.168.4.218:2618/"
The first line shows why nginx always reported the correct request size in its access log. Even though the client slowly buffers the response, nginx loads all the data from fastcgi initially. This also shows why a timeout in the PHP interpreter wasn't the issue.
The second line shows our real problem: client timed out
. It looks like there's a roughly 60 second limit on how long the client can take to load the response. A helpful thread in the nginx mailing list suggests that the send_timeout
needs to be increased.
I increased the send_timeout
in both nginx configs to 600 seconds, and now songs no longer skip! 🎉
Update 2023-02-25: this change was merged into the example nginx config for koel.