Tuesday, April 28, 2015

Experiences with cpp-netlib

I've been developing an audio fingerprint query server on Ubuntu 14.04 and have chosen cpp-netlib as the HTTP library used to receive and respond to queries. I chose cpp-netlib because it seemed like it had a very simple programming interface, a lot like the tornado framework and requests library I favor in Python. You basically instantiate a class from a server template, implement a special operator() method, and read and write from the parameters to that operator() method.

In the course of doing that, I ran into some bumps while getting my code to work, and the official documentation for cpp-netlib was lacking for my needs. This was disappointing, and the fact that there seems to have been little recent activity on its github page, makes me question my choice of library.

However, after some digging online, I eventually resolved all my problems, so I thought I would share my findings here so others might benefit. Note that the code below is using namespace std and #includes <string> and <array>.

Declarations

Here's a portion of the header file for my query daemon class:

#include <boost/network/protocol/http/server.hpp>

class QueryDaemon;

#define ASYNC_SERVER

#ifdef ASYNC_SERVER
typedef boost::network::http::async_server<QueryDaemon> HTTPServer;
#else
typedef boost::network::http::server<QueryDaemon> HTTPServer;
#endif

class QueryDaemon
{
public:
#ifdef ASYNC_SERVER
    void operator() (
        HTTPServer::request const& request,
        HTTPServer::connection_ptr connection);
#else
    void operator() (
        HTTPServer::request const& request,
        HTTPServer::response& response);
#endif

    // . . .

};

The key aspect of this is the typedef for HTTPServer, which is used extensively in the implementation.

Accessors

The cpp-netlib headers make extensive use of template metaprogramming, which can be confusing to the novice. The declarations are often very complicated, though usage is meant to be simple. To get useful information from the request, cpp-netlib provides some template functions that can be used as accessors (wrappers, in their terminology):

    string ip_addr = source(request);
    string uri = destination(request);
    string payload = body(request);

From what I can tell, the advantage of using these accessors rather than adding simple getters to the request interface is that they provide the same encapsulation of the details for extracting information from a type, but do not require modification to the interface for that type. This assumes that the existing interface for the type is sufficient to extract the needed data.

Synchronous Servers

The response object used in the synchronous server has a stock_reply method which makes it easy to return status codes to the client. The codes themselves are in the scope of the response object, so you would use code like HTTPServer::response::ok or HTTPServer::response::internal_server_error to use them.

It's not well documented, but the response object also has a headers container into which you can add individual key/value header pairs, using an STL-standard method like push_back. The following code demonstrates all of this:

void QueryDaemon::operator() (
    HTTPServer::request const& request,
    HTTPServer::response& response)
{
    // extract useful information from the request
    string ip_addr = source(request);
    string uri = destination(request);
    string payload = body(request);

    string result;
    bool success = processQuery(payload, uri, ip_addr, result);

    if (success)
    {
        response = HTTPServer::response::stock_reply(
            HTTPServer::response::ok,
            result);

        HTTPServer::response_header content_type;
        content_type.name = "Content-Type";
        content_type.value = "application/json";

        response.headers.push_back(content_type);
    }
    else
    {
        response = HTTPServer::response::stock_reply(
            HTTPServer::response::internal_server_error);
    }
}

Note that we are assigning to the response object itself, rather than invoking a method on it, or returning a new response object from the handler function.

Asynchronous Servers

Unlike synchronous servers, asynchronous servers do not have a response object. Instead, there is a connection object that you use to respond to the client. As far as I can tell, this connection object is not well documented, but is critical to the operation of asynchronous servers.

To return a status code, invoke the set_status method on the connection object, passing a result code in the scope of the connection object: connection->set_status(HTTPServer::connection::ok);

To set a key/value pair to the response headers, invoke the set_headers method on the connection object. Note that this method takes an object supporting the boost Single Pass Range concept, which means something with begin, end, and increment methods, like an iterator. A C++11 std::array can be used here. See the example below for some code.

Lastly, in asynchronous servers the order in which you set the result code, headers, and response body is critical. The result code must be set first, followed by the headers, followed by the response body. This implies that you must fully compute the response body before setting the result code. This is a little confusing, because you can write the response body in many chunks with the write method of the connection object. It is not clear how one should handle errors when writing multiple chunks after the result code has been set.

Here's some code for an asynchronous handler that demonstrates all this:

void QueryDaemon::operator() (
    HTTPServer::request const& request,
    HTTPServer::connection_ptr connection)
{
    // extract useful information from the request
    string ip_addr = source(request);
    string uri = destination(request);
    string payload = readBody(request, connection);  // defined later

    string result;
    bool success = processQuery(payload, uri, ip_addr, result);

    if (success)
    {
        connection->set_status(HTTPServer::connection::ok);

        array<HTTPServer::response_header, 1> headers =
        {
            { "Content-Type", "application/json" }
        };

        connection->set_headers(headers);

        connection->write(result);
    }
    else
    {
        connection->set_status(HTTPServer::connection::internal_server_error);
    }
}

Note that the technique for reading the body of the request (readBody) needs to be done asynchronously, and will be defined in a later edit or post.

Linking

Even though cpp-netlib is mostly a header-only library, you still need to link with it, especially if you are making an asynchronous server. It also uses boost heavily internally, so you'll need to link with that, too. Here are the relevant parts of my CMakeLists.txt file showing what I had to do to get my code to compile and link:

cmake_minimum_required(VERSION 2.8)

project(MY_PROJECT)

find_package(Boost REQUIRED system thread)

set(EXTRA_CXX_FLAGS "-std=c++0x")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${EXTRA_CXX_FLAGS}")

add_executable(query_daemon QueryDaemon.cpp)

target_link_libraries(query_daemon ${Boost_LIBRARIES} cppnetlib-server-parsers)

Performance

Once I had the basics down, I did some small load tests on both the synchronous and asynchronous versions of my server. The server merely returned a "no match" result, and did not perform any actual audio fingerprint lookups, so it was really a test of the overhead of cpp-netlib in both synchronous and asynchronous modes.

My test code was a C++ program that launched 32 query threads at once. Each query thread composed a query, fired it at the server (running on localhost) using libcurl, and reported how long it waited for the response. The test machine was an 8-core Intel Xeon E3-1270 v3 running at 3.5 GHz with 16 GB RAM.

With the synchronous server, most of the queries were handled in 1.01 seconds, with some of them taking more like 2.01 seconds. The results were bimodal: either 1.01 seconds or 2.01 seconds. With the asynchronous server, all the queries were handled in about 0.02 seconds or less.

I'm not sure how to take these results, since the bimodality of the synchronous server is weird and unexpected. I noticed this earlier when a different C++ test program used popen to launch a curl command to issue the query. If I used popen and curl, there was a 1 second delay before my handler code was executed. If I invoked curl directly from the command line, there was no one second delay and the response was more or less instant.

It is interesting, however, that if you look just at the amount of time spent in the handler function itself, the synchronous server is much faster: around 80 microseconds versus 280 for the asynchronous handler function. This agrees with the official documentation, which states:
If your application does not need to write out information asynchronously or perform potentially long computations, then the synchronous server gives a generally better performance profile than the asynchronous server. 
Ultimately, I plan to use the asynchronous model because the audio fingerprint queries can often take hundreds of milliseconds of pure computation, and I want to gracefully handle lots of concurrent connections.

Labels: , ,

Friday, April 24, 2015

Getting Ubuntu Installed Package Versions

I recently discovered the apt-show-versions tool which lists the name and version number of all installed packages on an Ubuntu system. Coupled with the column tool, you can get a nice report of what is installed on your system, and what an upgrade would give you:

smrtv@fre-build1:~$ apt-show-versions | grep ssl | column -t
libssl1.0.0/precise-security     upgradeable  from 1.0.1-4ubuntu5.11 to 1.0.1-4ubuntu5.25
openssl/precise-security         upgradeable  from 1.0.1-4ubuntu5.11 to 1.0.1-4ubuntu5.25
python-openssl/precise-security  uptodate     0.12-1ubuntu2.1
ssl-cert/precise-updates         uptodate     1.0.28ubuntu0.1

Yes, I cheated a bit on the formatting, as column is not smart enough to stop columnizing after a particular column.

Labels: ,

Tuesday, June 24, 2014

GYP

GYP is an alternative to CMake.

Wednesday, June 04, 2014

Hobbit clip

Here's a clip I made from The Hobbit:

Monday, May 19, 2014

killing processes

On a server I was working on, a bunch of stuck processes had accumulated. They were all started by cron, and consisted of a shell command that launched a python process, and was waiting on it. (It was the python processes that were stuck).  If I could kill the python process, the shell would terminate on its own. The killall command did not work, as I could not figure out how to indicate that I wanted only particular python processes, not all of them. I eventually worked up a pipeline of commands to kill the processes:

for i in `ps waux | grep common_string | grep -v shell_string | tr -s ' ' | cut -s -f 2 -d ' '` ; do kill $i ; done

The ps command lists all processes. The grep for common_string filters for both the shell and python processes. I could not grep for just the python processes because their entire command line is contained within the shell command line. Thus I needed to filter out the shell processes with the grep -v shell-string. The ps command leaves a lot of spaces in its output, so I squeeze that down to single spaces with the tr -s command. And finally I cut out the second space-separated field to yield the list of process IDs I want to kill. I then run those through a for loop to kill each one in turn.

Yes, developing this command probably took longer than typing all the kill commands manually, but by blogging my results hopefully I can leverage this technique in the future.

Wednesday, April 16, 2014

In the style of Jerry Rice...

One of my better clips:



From Superbowl XLVIII.

Monday, April 14, 2014

Check out this clip I made from Raiders of the Lost Ark:



Classic Indiana Jones!

Wednesday, February 26, 2014

Chronic Overcommittment

This piece on InfoQ really captures the dynamics of all the software companies where I have worked, and shows the root causes of the chronic overcommitment that plagues them.

http://www.infoq.com/articles/chronic-overcommitment

When an organization sells something it does not have, which is typical in deals made through the business development department, there is pressure to deliver as soon as possible. The customer would like it now, so in a sense the project is late from the very beginning: behind by design as I like to call it.

Monday, February 24, 2014

8 Is Enough

I liked this article on the origins of the 8 hour workday, and the 90 minute cycles that might be better.

http://www.huffingtonpost.com/leonhard-widrich/the-origin-of-the-8-hour-_b_4524488.html#!

Friday, February 07, 2014

Not Being a Jerk

I hate to post one of those articles the ever watchful bots at LinkedIn send my way, but I thought this one on getting along with the people you lead and work with was pretty good:

http://www.linkedin.com/today/post/article/20140203153113-6526187-how-to-avoid-seeming-like-an-arrogant-know-it-all-jerk

I think I do most of those things, but that's not really for me to say.

Tuesday, January 21, 2014

Baker rocks

I'm having fun with baker for parsing command line arguments in Python today. It's pretty elegant. Too bad it doesn't seem to be an Ubuntu package yet. I hate mixing package systems.

Monday, January 20, 2014

Finding Multiple Needles in a Haystack

Today I needed to find out if a JSON config file contained all the cases it needed to cover. I extracted the cases from a spreadsheet (a list of numbers, really), saved them in a text file, and ran the two through the following script. It reported the four cases not covered in the config file, much easier than me manually comparing two lists.

#!/usr/bin/python
# script to make sure all words in the first file are present in the second

import sys

if len(sys.argv) != 3:
    print 'Usage:', sys.argv[0], '<needles>', '<haystack>'
    sys.exit(0)

# load the haystack
with open(sys.argv[2], 'r') as h:
    haystack = h.read()

# iterate through the needles
with open(sys.argv[1], 'r') as n:
    for line in n:
        line = line.strip()
        if not line or line.startswith('#'):
            continue
        if line not in haystack:
            print line


After I did this, I asked a colleague if there was a Unix one-liner to do the same thing. There isn't really, but you can use the comm command to get the difference between two sets, and using the numbers from the JSON file (extracted with jq), and the cases from the spreadsheet, both sorted and uniqued [1], to achieve the same result. It might be doable in a line or two, but composing and debugging a complex command like that might take longer than writing that tiny Python program. YMMV.

[1] Or just sort -u.

Friday, January 10, 2014

many media files

I just learned about this trove of example files in various media formats.

http://samples.mplayerhq.hu

What a resource!

Tuesday, January 07, 2014

OKR: Objectives and Key Results

I like the OKR system that Google and Intel use for managing objectives and measuring results:

http://www.businessinsider.com/googles-ranking-system-okr-2014-1

Tuesday, August 23, 2011

jStat JavaScript Statistics

Interesting, but young: http://www.jstat.org/

Tuesday, June 28, 2011

S4 Distributed Stream Computing Platform

Worth checking out: http://s4.io/

Tuesday, June 21, 2011

OpenShot Video Editor for Linux

Ubuntu Unity Keyboard Shortcuts

Friday, June 17, 2011

Mp3 decoder implemented in JavaScript

http://www.geek.com/articles/chips/javascript-decoder-lets-mp3s-play-in-firefox-without-flash-20110617/

I think a lot of this came out of a music hack day in Berlin. If you can decode MP3s in JavaScript, you can fingerprint, too. Interesting....

BTW, the jsmad (JavaScript) and libmad (fixed-point C) code that it is based on would be great to learn from.

Thursday, June 16, 2011

WikiVS - A Comparison-centric Wiki

WikiVS has lots of interesting comparisons about related things. I just read the git vs. mercurial comparison, and added a brief page comparing Matlab and Octave. I expect to come to rely on this site a lot, though it needs a lot more data.

Tuesday, June 14, 2011

Flash-free audio in HTML 5

Monday, May 16, 2011

Berkeley engineering professional masters program

Tuesday, May 10, 2011

Interesting Devices

Monday, May 02, 2011

BILLION DOLLAR-O-GRAM

Saturday, April 30, 2011

Maqetta

Visual authoring of HTML5 user interfaces: http://maqetta.org/

CubeSats

Tuesday, April 26, 2011

iPhone AR

7 up-and-coming languages

iPad/iPhone App: SoundPrism

SoundPrism:

 

http://www.youtube.com/watch?v=385CymvTecU

 

The interesting thing about this software is that these guys developed a new tone model, called spiral model, which was also used a bit in harmony recognition, because it supports easier look-up of “correct” cords.

 

Information about how Kinect works

This Slashdot page has three links that might be worth exploring:

http://games.slashdot.org/story/11/03/26/2014234/Kinects-AI-Breakthrough-Explained

This makes me curious about decision forests.

Hypergraphs

Hypergraphs: This short article mentions that they might be useful for recommendation systems.
http://www.infoq.com/news/2011/04/Trinity;jsessionid=05DAE97785AD8E755411B874C55D9EAB

Impressive object tracking software

http://www.i-programmer.info/news/105-artificial-intelligence/2310-predator-better-than-kinect.html
Watch the video for the real demo. Real-time, one OpenCV API, no GPU, GPL open source. Very impressive.

Interesting concepts

Saturday, April 16, 2011

Computational Thinking

Computational thinking. I like this concept.

Friday, June 15, 2007

Cross-Platform C++ UI Libraries

I used to think that Qt was the only choice, but I just learned about Juce, which looks pretty good, too. Qt has an expensive license for Windows while Juce is free GPL open source. There are probably more. I'll post them here as I discover them.

Tuesday, February 27, 2007

Software & Patents

Here's a good, easy-to-read, (although long) article on patents and software:

Wednesday, February 21, 2007

Volcanoes

Here are a couple volcano videos my daughter likes:

Tuesday, February 20, 2007

Cat Flushes Toilet

My daughter really likes a story of a cat that flushes a toilet over and over again. I found this video to go along with it:

The story is that a man notices that his water bill is much higher than usual. He looks through the house for leaks and finds none. One day he is home sick, trying to sleep, and the house is very quiet. He hears the sound of running water, thinks that must be the leak, and goes looking for its source. It is coming from the basement bathroom, and when he looks in, the family cat is flushing the toilet over and over, watching the water spiral down the drain.

Thursday, February 01, 2007

Java Code Checking Tools

I recently came across two open source projects that look for bugs and questionable coding in Java source code. They are: FindBugs seems to be more actively maintained, but JLint has that strong connection to the C/C++ tool lint.

Wednesday, December 06, 2006

Custom Photo Flip Books

I saw one of these a few months ago, and it was pretty cool. It's a flip book that shows a movie that you upload to their site. It works well with short movies that you can make on your digital camera. It's a great gift idea, especially for relatives who have too much stuff already.

Tuesday, October 10, 2006

Cars Movie

It's about time I stopped googling for this and simply linked it up.

Saturday, October 07, 2006

Old Software Versions

I was looking for an old copy of the WinZip command-line utility, and I stumbled on this site archiving old software versions, "because newer isn't always better" as their motto says. There are also several other sites, but the one above was the least annoying.

Saturday, September 09, 2006

Cape Breton Stepdance

Another YouTube dance clip popular with my daughter:

Ballet Video

Here's a ballet video that my daughter likes a lot:

Friday, September 01, 2006

The Acts of Gord: Stories of a Real Life Comic Book Guy

The Acts of Gord is a site describing the experiences of a real life comic book guy, like the one in The Simpsons. Some of these stories are fantastic. What a character.

Pooping Cat

My daughter really liked this clip of a cat pooping in a toilet.

Japanese Potty Training Clip

Here's the famous Japanese potty training clip. Very funny.

Tuesday, August 08, 2006

Extreme Urban Gymnastics

Here's a video of a guy climbing the sides of buildings, running up walls, and other unbelievable stunts. Give it time to get going. Wow.

Russian Parkour

Very impressive. Yes, there really are video coding artifacts near the beginning of the video.

Friday, July 14, 2006

Googlewashing

Here's a good article on how Google can obscure and replace the original meaning for a term: Googlewashing, as it has been named. While Google is certainly convenient, it is dangerously powerful in the way it, through reflecting the content of popular web sites and blogs, allows those sites to construct their own reality and which it in turn promotes to the rest of the world. Don't believe everything that you read.

Thursday, June 29, 2006

Brokeback to the Future

I stumbled across this hilarious parody the other day. It's a movie trailer in the style of Brokeback Mountain featuring clips from the Back to the Future trilogy, but the clips are taken out of context so that a homosexual affair between the Michael J. Fox and Christopher Lloyd characters is suggested. Genius.

Brokeback to the Future

Wednesday, June 28, 2006

New Homepage

I just uploaded a new homepage to my web space. It will be a platform on which I can host content that I want to share.