Note this is not a user guide for SS, nor code walk through but my personal Shadowsocks source code exploration.
Introduction
Shadowsocks (SS) is an open source project that offers a mechanism to bypass restrictions in accessing some Internet content with high performance. It is quite similar to the difference between Docker and VM, SS is not a heavy weighted VPN that operates at L2 or L3, it can operate on top of transport layer (SOCKS5) with many bypass rules.
I explored SS briefly two years ago and built a service on top of the framework, however, never got a chance to understand how SS works at its core. A few reasons why I started looking at this again, primary because I am exploring distributed training in deep learning and a good understanding of the interconnect communication is a pre-requisite; second is just my curiosity to understand how it works. I also attempted to understand shadowsocks-libev when I was building the service around it, specifically targeting iOS, but I found the whole eco-system (ss iOS client framework, iOS VPN plugin, iOS tunnel debugging, ss server, ss account management service) too big to understand, so I pretty much gave up on a deep dive. Now it is time to do some research in this project.
TCP/IP & Tun/Tap Devices
I prefer to start from the fundamentals and then move up so I started by looking at the TCP/IP source code. Most people with CS degree should have a basic understanding of networking, OSI model and TCP handshake, but books and code are very different. Your level of understanding will be uplifted to the next level when you understand its source code. Reading Linux source code is absolutely not the best choice here since it is far too over complicated, so I looked up some simplified TCP/IP implementations in user space.
The first one I tried to look into is: level-ip project, it comes with a series of blog posts. However, the project is not very well maintained, and it is not straightforward to follow the blog posts. That is also one of the drawbacks of many open source projects and blog posts, folks assume readers have all the background when reading the posts or code. Many open source demo projects have 0 documentation on their header files, and READ.md file just for the sake of creating one. Nonetheless, I believe open source demonstrates the best of humanity and it carries us forward, I simply can’t imagine a world without public projects on GitHub, but the craftsmanship bits of many open source projects are missing.
Later, I switched to a different project: tapip. The level-ip project is based off this project and the difference is: this tapip project works without exceptions and warnings when I built and ran it. It also comes with many debug utils, like this snippet:
// Print out the packet, mini Wireshark.
static void pkgdbg(buffer_t *buf) {
int i = 0;
int len = buf->len;
for (i = 0; i < len; i++) {
if ((i % 16) == 0)
ferr("%08x: ", i);
if (isprint(buf->data[i])) {
ferr("%c", buf->data[i]);
}
else {
ferr(".");
}
if ((i % 16) == 15) {
ferr("\n");
}
}
if ((i % 16) != 0) ferr("\n");
// ferr("packet buffer(raw):\n");
// for (i = 0; i < len; i++) {
// if ((i % 16) == 0)
// ferr("%08x: ", i);
// if ((i % 2) == 0)
// ferr(" ");
// ferr("%02x", buf->data[i]);
// if ((i % 16) == 15)
// ferr("\n");
// }
// if ((i % 16) != 0)
// ferr("\n");
} else
ferr(".");
if ((i % 16) == 15)
ferr("\n");
}
if ((i % 16) != 0) ferr("\n");
// ferr("packet buffer(raw):\n");
// for (i = 0; i < len; i++) {
// if ((i % 16) == 0)
// ferr("%08x: ", i);
// if ((i % 2) == 0)
// ferr(" ");
// ferr("%02x", buf->data[i]);
// if ((i % 16) == 15)
// ferr("\n");
// }
// if ((i % 16) != 0)
// ferr("\n");
}
This project seems a minimal implementation of TCP/IP stack, but it is still very complicated, so I forked the project and added some study notes.
$ cloc ./
80 text files.
78 unique files.
7 files ignored.
github.com/AlDanial/cloc v 1.82 T=0.05 s (1488.4 files/s, 163686.6 lines/s)
--------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------
C 37 525 749 4913
C/C++ Header 22 229 82 1234
make 10 45 4 184
Bourne Shell 2 9 32 49
Markdown 1 8 0 44
JSON 2 0 0 31
--------------------------------------------------------------------
SUM: 74 816 867 6455
--------------------------------------------------------------------
It has 6455 lines of code to implement a most simple TCP/IP stack. I put my notes on my fork’s Notes.md file. After walking through the project, you should be able to parse TCP/IP in your brain.
Quick knowledge check before you move on:
Why is 3 way handshake required in TCP? Why not 2 way TCP?
What is a socket and how it works? At what OSI level is socket operating at?
What is port and why is that needed?
SOCKS5
SS client acts as a traditional SOCSK5 server and provides proxy service to clients. You would need some knowledge on SOCKS5 protocol to understand how it works. It is highly recommended to follow the steps in this blog post to understand SOCKS5 with Wireshark.
SOCKS5 RFC details: https://tools.ietf.org/html/rfc1928
client <-> server communication in short:
- TCP/IP handshake.
- SOCKS5 handshake
- client: I’d like to use SOCKS version, Auth method.
- server: here is my choice, SOCKS version and Auth method.
- client: I want to connect to this domain at this port.
- server: look good to me.
3. Proxy HTTP requests
- client -> server: full HTTP request (there is no extra header in this frame for SOCKS5 info after handshake is done).
- server -> client: full HTTP response.
lib-ev
Ss uses libev as the event loop implementation, and it performs better than the system lib libevent. When I looked at lib-ev project on GitHub, I was surprised to see last update of that project was 4 year ago. I wondered why ss would use this no longer updated project for a second till I figured ss itself has been 8 year old. You can find a list of code examples of libev here and read lib-ev’s official guide here.
SS uses primarily two types of events, one is timer events (delay handling, timeouts), the other is io events:
ev_io_init(&server->recv_ctx->io, server_recv_cb, fd, EV_READ);
This would invoke server_recv_cb whenever fd is readable (EV_READ). There is another EV_WRITE flag, that will trigger the callback when the file descriptor is writable (meaning there is enough space to write to the send buffer).
One funny piece of code I struggled a bit is the data (context) passing in the callback, since the server is passed around by parsing the ev_io *w argument in the callback, like:
static void remote_recv_cb(EV_P_ ev_io *w, int revents) {
remote_ctx_t *remote_recv_ctx = (remote_ctx_t *)w;
remote_t *remote = remote_recv_ctx->remote;
...
}
From the user guide:
“Each watcher has, by default, a void *data member that you can read or modify at any time: libev will completely ignore it. This can be used to associate arbitrary data with your watcher. If you need more data and don’t want to allocate memory separately and store a pointer to it in that data member, you can also “subclass” the watcher type and provide your own data:
struct my_io {
ev_io io;
int otherfd;
void *somedata;
struct whatever *mostinteresting;
};
…
struct my_io w;
ev_io_init (&w.io, my_cb, fd, EV_READ);
And since your callback will be called with a pointer to the watcher, you can cast it back to your own type:
static void my_cb (struct ev_loop *loop, ev_io *w_, int revents){
struct my_io *w = (struct my_io *)w_;
…
}
More interesting and less C-conformant ways of casting your callback function type instead have been omitted.
”
Wait a second.. you pass in &w.io and you cast that into *w? That is because io is the first field and shares the same address as w 😀.
Shadowsocks White Paper
Till this point, you should feel very comfortable to read the source code and white paper of Shadowsocks. Most source code are under the /src folder and there are only a dozen of C files.
Conclusion
In the end, want to salute to the creator, maintainers and contributors to the SS project, and open sourced it. SS is technically not very complicated, but it lights up the silver lining at the other side of the tunnel.
This blog will be continuously updated when new fun fact are discovered.