Optimizing uwsgi for Many Many Threads and Processes

tl;dr : Consider optimizing uwsgi by setting `threads-stacksize = 64` or some small value in your uwsgi config. Python apps which do not use many C modules do not use the C stack very much. A smaller stack size mean threads use less memory and you can safely have more of them servicing requests.

Long story:

Years ago I was deploying a new flask web service using uwsgi. I needed it to scale to thousands of connections. I read a blog post (I searched and cannot find it now) which suggested 10 processes with 10 threads each to be able to serve 100 concurrent connections. After testing and tuning this particular app, we settled on 10 processes and 100 threads per process. It ran well.

Recently, a production app, which I helped deploy, fell on its face. It was performing very poorly, seemingly out of nowhere. This app was originally deployed with the same 10 processes, 100 threads per process configuration which I had used so successfully in the past. The ops team had already reduced the process count to 4 due to excessive memory use of the application. This means the application was only able to service 400 concurrent connections.

I still cannot entirely explain why the app ran for many months and then suddenly had problems. I’m guessing it is because of recent announcements driving more traffic to the site. The 400 threads were actually being used instead of sitting idle waiting for connections.

In the process of trying to restore service, our ops team wisely used a tool which I was not likely to have used (huge thanks to them). The tool is pmap and it shows mapped memory for a given process. I noticed something interesting in the output of pmap:

00007fcf75061000   8192K rw---   [ anon ]
00007fcf75861000      4K -----   [ anon ]
00007fcf75862000   8192K rw---   [ anon ]
00007fcf76062000      4K -----   [ anon ]

This was repeated with the same memory increment 50 times for a total of 100. It occurred to me that the default stack size of a thread in Linux is 8MB and that these memory maps were the stack of each thread. I was able to confirm this suspicion by running the app myself and adjusting the size by configuring uwsgi with –threads-stacksize.

I started by moving to 1MB which I know is the default Windows thread stack size, guessing it would still be plenty. Then I started to play limbo and see how low can I go. I started to get pretty happy when I broke the 256KB mark and our app was still functioning. Our app has the luxury of not having any deep calls. I might have been able to go lower, but once I got to 64KB, I didn’t see my point. Every order of magnitude decrease was smaller and smaller an improvement.

Moving from 8MB to 1MB took memory usage from 3.2GB to 400MB. Every halving of stack size halved overall memory usage of the thread stacks by this app. First 512KB/thread for 200MB, then 256KB/thread for 100MB, then 128KB/thread for 50MB, then 64KB/thread for 25MB. At this point, everything about the app was running exactly the same, the only difference being that I wasn’t wasting 3.2GB of memory in unused thread stacks.


I Welcome Parse Developers to Juju

Hello Parse developers,

I was curious how easy it would be to get the published parse-server-example to run with Juju. The end result is that there is a new juju charm named parse-server available.*

Deploying parse-server is as easy as running these commands in a bootstrapped juju environment. This means that it can run ANYWHERE.

juju deploy cs:~evarlast/trusty/parse-server-0
juju deploy mongodb
juju add-relation parse-server mongodb
juju expose parse-server

You’ll then be able to use the http api at port 1337.

For example:

curl -X POST -H “X-Parse-Application-Id: myAppId” -H “Content-Type: application/json” -d ‘{“whatever”:”data”}’

If you wish to take a look at this charm, its in the charmed branch of my fork of the parse-server-example. I do not recommend using this charm as an example of writing a good production charm. This is an example of a quick and dirty hack of a charm which happens to work.

Part of what makes Juju awesome is the magic of application modeling. While my hack of a parse-server charm isn’t production ready, it is building on a very production ready mongodb charm, which can be scaled out and made HA very easily. Charms are reusable open source ops. The mongodb ops have been captured in the mongodb charm. Any required parse-server ops need to be capture in a parse-server charm. The only ones captured so far are configuring the mongodb relation. While its a hack of a demo charm, it is a start.



* The real reason is that I have cloud envy and I saw the azure release at https://azure.microsoft.com/en-us/blog/azure-welcomes-parse-developers/ and I thought to myself, gee that is a lot of clicks, seems like there is a better way.