gq and reference-style-link everything
This commit is contained in:
parent
6d80ce514e
commit
35f1708c1f
@ -1,46 +1,62 @@
|
||||
# Erlang, tcp sockets, and active true
|
||||
|
||||
If you don't know erlang then [you're missing out](http://learnyousomeerlang.com/content).
|
||||
If you do know erlang, you've probably at some point done something with tcp sockets. Erlang's
|
||||
highly concurrent model of execution lends itself well to server programs where a high number
|
||||
of active connections is desired. Each thread can autonomously handle its single client,
|
||||
greatly simplifying the logic of the whole application while still retaining
|
||||
[great performance characteristics](http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1).
|
||||
If you don't know erlang then [you're missing out][0]. If you do know erlang,
|
||||
you've probably at some point done something with tcp sockets. Erlang's highly
|
||||
concurrent model of execution lends itself well to server programs where a high
|
||||
number of active connections is desired. Each thread can autonomously handle its
|
||||
single client, greatly simplifying the logic of the whole application while
|
||||
still retaining [great performance characteristics][1].
|
||||
|
||||
# Background
|
||||
|
||||
For an erlang thread which owns a single socket there are three different ways to receive data
|
||||
off of that socket. These all revolve around the `active` [setopts](http://www.erlang.org/doc/man/inet.html#setopts-2)
|
||||
flag. A socket can be set to one of:
|
||||
For an erlang thread which owns a single socket there are three different ways
|
||||
to receive data off of that socket. These all revolve around the `active`
|
||||
[setopts][2] flag. A socket can be set to one of:
|
||||
|
||||
* `{active,false}` - All data must be obtained through [recv/2](http://www.erlang.org/doc/man/gen_tcp.html#recv-2)
|
||||
calls. This amounts to syncronous socket reading.
|
||||
* `{active,true}` - All data on the socket gets sent to the controlling thread as a normal erlang
|
||||
message. It is the thread's responsibility to keep up with the buffered data
|
||||
in the message queue. This amounts to asyncronous socket reading.
|
||||
* `{active,once}` - When set the socket is placed in `{active,true}` for a single packet. That
|
||||
is, once set the thread can expect a single message to be sent to when data
|
||||
comes in. To receive any more data off of the socket the socket must either
|
||||
be read from using [recv/2](http://www.erlang.org/doc/man/gen_tcp.html#recv-2)
|
||||
or be put in `{active,once}` or `{active,true}`.
|
||||
* `{active,false}` - All data must be obtained through [recv/2][3] calls. This
|
||||
amounts to syncronous socket reading.
|
||||
|
||||
* `{active,true}` - All data on the socket gets sent to the controlling thread
|
||||
as a normal erlang message. It is the thread's
|
||||
responsibility to keep up with the buffered data in the
|
||||
message queue. This amounts to asyncronous socket reading.
|
||||
|
||||
* `{active,once}` - When set the socket is placed in `{active,true}` for a
|
||||
single packet. That is, once set the thread can expect a
|
||||
single message to be sent to when data comes in. To receive
|
||||
any more data off of the socket the socket must either be
|
||||
read from using [recv/2][3] or be put in `{active,once}` or
|
||||
`{active,true}`.
|
||||
|
||||
# Which to use?
|
||||
|
||||
Many (most?) tutorials advocate using `{active,once}` in your application [0][1][2]. This has to do with usability and
|
||||
security. When in `{active,true}` it's possible for a client to flood the connection faster than the receiving process
|
||||
will process those messages, potentially eating up a lot of memory in the VM. However, if you want to be able to receive
|
||||
both tcp data messages as well as other messages from other erlang processes at the same time you can't use `{active,false}`.
|
||||
So `{active,once}` is generally preferred because it deals with both of these problems quite well.
|
||||
Many (most?) tutorials advocate using `{active,once}` in your application
|
||||
\[0]\[1]\[2]. This has to do with usability and security. When in `{active,true}`
|
||||
it's possible for a client to flood the connection faster than the receiving
|
||||
process will process those messages, potentially eating up a lot of memory in
|
||||
the VM. However, if you want to be able to receive both tcp data messages as
|
||||
well as other messages from other erlang processes at the same time you can't
|
||||
use `{active,false}`. So `{active,once}` is generally preferred because it
|
||||
deals with both of these problems quite well.
|
||||
|
||||
# Why not to use `{active,once}`
|
||||
|
||||
Here's what your classic `{active,once}` enabled tcp socket implementation will probably look like:
|
||||
Here's what your classic `{active,once}` enabled tcp socket implementation will
|
||||
probably look like:
|
||||
|
||||
```erlang
|
||||
-module(tcp_test).
|
||||
-compile(export_all).
|
||||
|
||||
-define(TCP_OPTS, [binary, {packet, raw}, {nodelay,true}, {active, false}, {reuseaddr, true}, {keepalive,true}, {backlog,500}]).
|
||||
-define(TCP_OPTS, [
|
||||
binary,
|
||||
{packet, raw},
|
||||
{nodelay,true},
|
||||
{active, false},
|
||||
{reuseaddr, true},
|
||||
{keepalive,true},
|
||||
{backlog,500}
|
||||
]).
|
||||
|
||||
%Start listening
|
||||
listen(Port) ->
|
||||
@ -66,15 +82,16 @@ read_loop(Socket) ->
|
||||
end.
|
||||
```
|
||||
|
||||
This code isn't actually usable for a production system; it doesn't even spawn a new process for the new socket. But that's not
|
||||
the point I'm making. If I run it with `tcp_test:listen(8000)`, and in other window do:
|
||||
This code isn't actually usable for a production system; it doesn't even spawn a
|
||||
new process for the new socket. But that's not the point I'm making. If I run it
|
||||
with `tcp_test:listen(8000)`, and in other window do:
|
||||
|
||||
```bash
|
||||
while [ 1 ]; do echo "aloha"; done | nc localhost 8000
|
||||
```
|
||||
|
||||
We'll be flooding the the server with data pretty well. Using [eprof](http://www.erlang.org/doc/man/eprof.html) we can get an idea
|
||||
of how our code performs, and where the hang-ups are:
|
||||
We'll be flooding the the server with data pretty well. Using [eprof][4] we can
|
||||
get an idea of how our code performs, and where the hang-ups are:
|
||||
|
||||
```erlang
|
||||
1> eprof:start().
|
||||
@ -111,18 +128,30 @@ inet:setopts/2 12303598 5.72 4533863 [ 0.37]
|
||||
erlang:port_control/3 12303600 77.13 61085040 [ 4.96]
|
||||
```
|
||||
|
||||
eprof shows us where our process is spending the majority of its time. The `%` column indicates percentage of time the process spent
|
||||
during profiling inside any function. We can pretty clearly see that the vast majority of time was spent inside `erlang:port_control/3`,
|
||||
the BIF that `inet:setopts/2` uses to switch the socket to `{active,once}` mode. Amongst the calls which were called on every loop,
|
||||
it takes up by far the most amount of time. In addition all of those other calls are also related to `inet:setopts/2`.
|
||||
eprof shows us where our process is spending the majority of its time. The `%`
|
||||
column indicates percentage of time the process spent during profiling inside
|
||||
any function. We can pretty clearly see that the vast majority of time was spent
|
||||
inside `erlang:port_control/3`, the BIF that `inet:setopts/2` uses to switch the
|
||||
socket to `{active,once}` mode. Amongst the calls which were called on every
|
||||
loop, it takes up by far the most amount of time. In addition all of those other
|
||||
calls are also related to `inet:setopts/2`.
|
||||
|
||||
I'm gonna rewrite our little listen server to use `{active,true}`, and we'll do it all again:
|
||||
I'm gonna rewrite our little listen server to use `{active,true}`, and we'll do
|
||||
it all again:
|
||||
|
||||
```erlang
|
||||
-module(tcp_test).
|
||||
-compile(export_all).
|
||||
|
||||
-define(TCP_OPTS, [binary, {packet, raw}, {nodelay,true}, {active, false}, {reuseaddr, true}, {keepalive,true}, {backlog,500}]).
|
||||
-define(TCP_OPTS, [
|
||||
binary,
|
||||
{packet, raw},
|
||||
{nodelay,true},
|
||||
{active, false},
|
||||
{reuseaddr, true},
|
||||
{keepalive,true},
|
||||
{backlog,500}
|
||||
]).
|
||||
|
||||
%Start listening
|
||||
listen(Port) ->
|
||||
@ -194,20 +223,30 @@ erlang:port_control/3 3 0.00 59 [ 19.67]
|
||||
tcp_test:read_loop/1 20716370 100.00 12187488 [ 0.59]
|
||||
```
|
||||
|
||||
This time our process spent almost no time at all (according to eprof, 0%) fiddling with the socket opts.
|
||||
Instead it spent all of its time in the read_loop doing the work we actually want to be doing.
|
||||
This time our process spent almost no time at all (according to eprof, 0%)
|
||||
fiddling with the socket opts. Instead it spent all of its time in the
|
||||
read_loop doing the work we actually want to be doing.
|
||||
|
||||
# So what does this mean?
|
||||
|
||||
I'm by no means advocating never using `{active,once}`. The security concern is still a completely valid concern and one
|
||||
that `{active,once}` mitigates quite well. I'm simply pointing out that this mitigation has some fairly serious performance
|
||||
implications which have the potential to bite you if you're not careful, especially in cases where a socket is going to be
|
||||
receiving a large amount of traffic.
|
||||
I'm by no means advocating never using `{active,once}`. The security concern is
|
||||
still a completely valid concern and one that `{active,once}` mitigates quite
|
||||
well. I'm simply pointing out that this mitigation has some fairly serious
|
||||
performance implications which have the potential to bite you if you're not
|
||||
careful, especially in cases where a socket is going to be receiving a large
|
||||
amount of traffic.
|
||||
|
||||
# Meta
|
||||
|
||||
These tests were done using R15B03, but I've done similar ones in R14 and found similar results. I have not tested R16.
|
||||
These tests were done using R15B03, but I've done similar ones in R14 and found
|
||||
similar results. I have not tested R16.
|
||||
|
||||
* [0] http://learnyousomeerlang.com/buckets-of-sockets
|
||||
* [1] http://www.erlang.org/doc/man/gen_tcp.html#examples
|
||||
* [2] http://erlycoder.com/25/erlang-tcp-server-tcp-client-sockets-with-gen_tcp
|
||||
* \[0] http://learnyousomeerlang.com/buckets-of-sockets
|
||||
* \[1] http://www.erlang.org/doc/man/gen_tcp.html#examples
|
||||
* \[2] http://erlycoder.com/25/erlang-tcp-server-tcp-client-sockets-with-gen_tcp
|
||||
|
||||
[0]: http://learnyousomeerlang.com/content
|
||||
[1]: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
|
||||
[2]: http://www.erlang.org/doc/man/inet.html#setopts-2
|
||||
[3]: http://www.erlang.org/doc/man/gen_tcp.html#recv-2
|
||||
[4]: http://www.erlang.org/doc/man/eprof.html
|
||||
|
27
goplus.md
27
goplus.md
@ -1,16 +1,19 @@
|
||||
# Go and project root
|
||||
|
||||
Compared to other languages go has some strange behavior regarding its project root settings. If you
|
||||
import a library called `somelib`, go will look for a `src/somelib` folder in all of the folders in
|
||||
the `$GOPATH` environment variable. This works nicely for globally installed packages, but it makes
|
||||
encapsulating a project with a specific version, or modified version, rather tedious. Whenever you go
|
||||
to work on this project you'll have to add its path to your `$GOPATH`, or add the path permanently,
|
||||
which could break other projects which may use a different version of `somelib`.
|
||||
Compared to other languages go has some strange behavior regarding its project
|
||||
root settings. If you import a library called `somelib`, go will look for a
|
||||
`src/somelib` folder in all of the folders in the `$GOPATH` environment
|
||||
variable. This works nicely for globally installed packages, but it makes
|
||||
encapsulating a project with a specific version, or modified version, rather
|
||||
tedious. Whenever you go to work on this project you'll have to add its path to
|
||||
your `$GOPATH`, or add the path permanently, which could break other projects
|
||||
which may use a different version of `somelib`.
|
||||
|
||||
My solution is in the form of a simple script I'm calling go+. go+ will search in currrent directory
|
||||
and all of its parents for a file called `GOPROJROOT`. If it finds that file in a directory, it
|
||||
prepends that directory's absolute path to your `$GOPATH` and stops the search. Regardless of whether
|
||||
or not `GOPROJROOT` was found go+ will passthrough all arguments to the actual go call. The
|
||||
My solution is in the form of a simple script I'm calling go+. go+ will search
|
||||
in currrent directory and all of its parents for a file called `GOPROJROOT`. If
|
||||
it finds that file in a directory, it prepends that directory's absolute path to
|
||||
your `$GOPATH` and stops the search. Regardless of whether or not `GOPROJROOT`
|
||||
was found go+ will passthrough all arguments to the actual go call. The
|
||||
modification to `$GOPATH` will only last the duration of the call.
|
||||
|
||||
As an example, consider the following:
|
||||
@ -23,8 +26,8 @@ As an example, consider the following:
|
||||
/hello.go
|
||||
```
|
||||
|
||||
If `hello.go` depends on `somelib`, as long as you run go+ from `/tmp/hello` or one of its children
|
||||
your project will still compile
|
||||
If `hello.go` depends on `somelib`, as long as you run go+ from `/tmp/hello` or
|
||||
one of its children your project will still compile
|
||||
|
||||
Here is the source code for go+:
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user