Archive for September, 2009

Graphviz

Sunday, September 27th, 2009

Some weeks ago I found this program called Graphviz, seems rather interesting, can’t wait to use it in any future project documentation I get involved with. Particularly useful to design/document state machines that are common in telephony.

Quick tip for debugging deadlocks

Sunday, September 27th, 2009

If you ever find yourself with a deadlock in your application, you can use gdb to attach to the application, then sometimes you find one of the threads that is stuck trying to lock mutex x. Then you need to find out who is currently holding x and therefore deadlocking your thread (and equally important why the other thread is not releasing it).

At least on recent libc implementations in Linux, the mutex object seems to have a member named “__owner”. Let me show you what I recently saw when debugging a deadlocked application.

(gdb) f 4
#4  0x0805ab46 in ACE_OS::mutex_lock (m=0xa074248) at include/ace/OS.i:1406
1406      ACE_OSCALL_RETURN (ACE_ADAPT_RETVAL (pthread_mutex_lock (m), ace_result_),
(gdb) p *m
$8 = {__data = {__lock = 2, __count = 0, __owner = 17828, __kind = 0, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
  __size = "\002\000\000\000\000\000\000\000�E\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}

We can see that the __owner is 17828. This number is the LWP (Light-weight process) id of the thread holding the lock. Now you can go to examine that thread stack and find out why that thread is also stuck.

This example also brings up a regular point of confusion for some Linux application developers. What is the difference between LWP and POSIX thread id ( the pthread_t type in pthread.h)?. The difference is that pthread_t is a user space concept, is simply an identifier for the thread library implementing POSIX threads to refer to the thread and its resources, state etc. However the LWP is an implementation detail of how the Linux kernel implements threads, which is done through the “thread group” concept and LWP’s, that are processes that share memory pages and other resources with the other processes in the same thread group.

From the Linux kernel point of view the pthread_t value doesn’t mean anything, the LWP id is how you identify threads in the kernel, and they share the same numbering as regular processes, since LWPs are just a special type of process. Knowing this is useful when using utilities like strace. When you want to trace a particular thread of a multi threaded application, you need to provide the LWP of the thread you want to trace, a common mistake is to provide the process id, which in a multithreaded application the process id is just the LWP of the first thread in the application (the one that started executing main()).

Here is how you get each identifier in a C program:

#include <stdio.h>
#include <syscall.h>
#include <pthread.h>
 
int main()
{
  pthread_t tid = pthread_self();
  int sid = syscall(SYS_gettid);
  printf("LWP id is %d\n", sid);
  printf("POSIX thread id is %d\n", tid);
  return 0;
}

It’s important to note that getting the POSIX thread id is much faster than the LWP, because pthread_self() is just a library call and libc most likely has this value cached somewhere in user space, no need to go down to the kernel. As you can see, getting the LWP requires a call to the syscall() function, which effectively executes the requested system call, this is expensive (well, compared with the time required to enter a simple user space function).

Sangoma Tapping solution for Asterisk

Saturday, September 26th, 2009

Sangoma has offered a tapping device for T1/E1 lines for quite some time now. This physical tapping works by installing a PN 633 Tap Connection Adapter, which looks like this:

Sangoma Tapping

As you can see, you will receive the line from the CPE and plug it into the PN 633 and then plug the line from the NET too. At the other side of the adapter you can pull out 2 cables and connect them to your box with Sangoma boards specially configured in high impedance mode, which basically means will be passive and not affect the behavior of your PRI link, passively will be monitoring the E1/T1 link.

As you can tell from the image, there is also 2 cables involved for a single link, this means that you need an A102 card to monitor just 1 E1/T1 link, because one port is used for tx from the CPE and the other for tx from the NET. Until today, it was not possible to use Asterisk because Asterisk assumes each card port is meant to send and receive data for a circuit, Asterisk was meant to be an active component of the circuit, not a passive element.

Now Asterisk can do that. I just submitted 2 patches to Digium’s bug tracker. One for LibPRI, the library that takes care of the Q921 and Q931 signaling and the other for chan_dahdi.c, the Asterisk channel driver used to connect it to PSTN circuits.

LibPRI needed to be modified because it does a lot of state machine checking when receiving a Q921/Q931 frame, for example, if receives a Q931 message “PROCEEDING”, is going to check that a call was already in place and a SETUP message was previously sent, when working as a passive element in the circuit no state checks should be done, we just need to decode the message and return a meaningful telephony event to Asterisk (like a RING event when the SETUP message is seen on the line).

https://issues.asterisk.org/view.php?id=15970

The most important change is this:

pri_event *pri_read_event(struct pri *pri)
{
        char buf[1024];
        int res;
        res = pri->read_func ? pri->read_func(pri, buf, sizeof(buf)) : 0;
        /* this check should be at some routine in q921.c */
        /* at least 4 bytes of Q921 and at check buf[4] for Q931 Network packet */
        if (res < 5 || ((int)buf[4] != 8)) {
                return NULL;
        }
        res = q931_read_event(pri, (q931_h*)(buf + 4), res - 4 - 2 /* remove 4 bytes of Q921 h and 2 of CRC */);
        if (res == -1) {
                return NULL;
        }
        if (res & Q931_RES_HAVEEVENT) {
                return &pri->ev;
        }
        return NULL;
}
 
/* here we trust receiving q931 only (no maintenance or anything else)*/
int q931_read_event(struct pri *ctrl, q931_h *h, int len)
{
        q931_mh *mh;
        q931_call *c;
        int cref;
        int missingmand;
 
        //q931_dump(ctrl, h, len, 0);
 
        mh = (q931_mh *)(h->contents + h->crlen);
 
        cref = q931_cr(h);
 
        c = q931_getcall(ctrl, cref & 0x7FFF);
        if (!c) {
                pri_error(ctrl, "Unable to locate call %d\n", cref);
                return -1;
        }
 
        if (prepare_to_handle_q931_message(ctrl, mh, c)) {
                return 0;
        }
 
        missingmand = 0;
        if (q931_process_ies(ctrl, h, len, mh, c, &missingmand)) {
                return -1;
        }
 
        return post_handle_q931_message(ctrl, mh, c, missingmand, 0);
}

The rest is pretty much just modify post_handle_q931_message to skip state machine checking when invoked with the last parameter as 0, which is only done from q931_read_event, that is the passive q931 processing routine I added.

Asterisk needed to be modified because it needs to correlate that a RING event received, let’s say in span 1 and channel 1, is probably going to be related to a PROGRESS message in span 2 channel 1 (which is the other side of the connection replying to the SETUP Q931 message). Furthermore, once the PROGRESS message is received, Asterisk must launch a regular channel and then create a DAHDI conference (using DAHDI pseudo channels) to mix the audio transmitted by the CPE and NET and return this mixed audio as a single frame to any Asterisk application reading from the channel.

https://issues.asterisk.org/view.php?id=15971

Some interesting code snippet of the changes to Asterisk is where I create the DAHDI conference to do the audio mixing, which is later retrieved by the core via the channel driver interface interface dahdi_read().

This is done during dahdi_new() routine, when creating the new Asterisk channel.

        if (i->tappingpeer) {
                struct dahdi_confinfo dahdic = { 0, };
                /* create the mixing conference
                 * some DAHDI_SETCONF interface rules to keep in mind
                 * confno == -1 means create new conference with the given confmode
                 * confno and confmode == 0 means remove the channel from its conference */
                dahdic.chan = 0; /* means use current channel (the one the fd belongs to)*/
                dahdic.confno = -1; /* -1 means create new conference */
                dahdic.confmode = DAHDI_CONF_CONFMON | DAHDI_CONF_LISTENER | DAHDI_CONF_PSEUDO_LISTENER;
                fd = dahdi_open("/dev/dahdi/pseudo");
                if (fd < 0 || ioctl(fd, DAHDI_SETCONF, &dahdic)) {
                        ast_log(LOG_ERROR, "Unable to create dahdi conference for tapping\n");
                        ast_hangup(tmp);
                        i->owner = NULL;
                        return NULL;
                }
                i->tappingconf = dahdic.confno;
                i->tappingfd = fd;
 
                /* add both parties to the conference */
                dahdic.chan = 0;
                dahdic.confno = i->tappingconf;
                dahdic.confmode = DAHDI_CONF_CONF | DAHDI_CONF_TALKER;
                if (ioctl(i->subs[SUB_REAL].dfd, DAHDI_SETCONF, &dahdic)) {
                        ast_log(LOG_ERROR, "Unable to add chan to conference for tapping devices: %s\n", strerror(errno));
                        ast_hangup(tmp);
                        i->owner = NULL;
                        return NULL;
                }
                dahdic.chan = 0;
                dahdic.confno = i->tappingconf;
                dahdic.confmode = DAHDI_CONF_CONF | DAHDI_CONF_TALKER;
                if (ioctl(i->tappingpeer->subs[SUB_REAL].dfd, DAHDI_SETCONF, &dahdic)) {
                        ast_log(LOG_ERROR, "Unable to add peer chan to conference for tapping devices: %s\n", strerror(errno));
                        ast_hangup(tmp);
                        i->owner = NULL;
                        return NULL;
                }
                ast_log(LOG_DEBUG, "Created tapping conference %d with fd %d between dahdi chans %d and %d for ast channel %s\n",
                                i->tappingconf,
                                i->tappingfd,
                                i->channel,
                                i->tappingpeer->channel,
                                tmp->name);
                i->tappingpeer->owner = i->owner;
       }

Later we sent the Asterisk channel to the dial plan.

                /* from now on, reading from the conference has the mix of both tapped channels, we can now launch the pbx thread */
                if (ast_pbx_start(c) != AST_PBX_SUCCESS) {
                        ast_log(LOG_ERROR, "Failed to launch PBX thread for passive channel %s\n", c->name);
                        ast_hangup(c);
                }
                break;

And finally when the Asterisk core requests a frame from chan_dahdi we return the mixed audio.

       /* if we have a tap peer we must read the mixed audio */
        if (p->tappingpeer) {
                /* passive channel reading */
                /* first read from the 2 involved dahdi channels just to consume their frames */
                res = read(p->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
                CHECK_READ_RESULT(res);
                res = read(p->tappingpeer->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
                CHECK_READ_RESULT(res);
                /* now read the mixed audio that will be returned to the core */
                res = read(p->tappingfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
        } else {
                /* no tapping peer, normal reading */
                res = read(p->subs[idx].dfd, readbuf, p->subs[idx].linear ? READ_SIZE * 2 : READ_SIZE);
        }

Finally, you can use regular dial plan Asterisk rules to Record() the conversation, Dial() to someone interested in auditing the call etc. Of course, any audio transmitted to this passive channel will be dropped, therefore using applications like Playback() in this channels just don’t make sense, you can still do it though, but all the audio will be silently dropped. Also remember this is a regular channel (that just happens to ignore any transmitted media or signaling), therefore you can still retrieve the ANI, DNID etc.

[from-pstn]
exten => s,1,Answer()
;exten => s,n,Dial(SIP/moy)
exten => s,n,Record(advanced-recording%d:wav)
exten => s,n,Hangup()
exten => _X.,1,Goto(s,1)

The configuration required is minimal. Sangoma board configuration is described here. It’s not much different than configuring a regular T1/E1 link though. In the Asterisk side, you just have to specify the parameter “tappingpeerpos=next” or “tappingpeerpos=prev” in chan_dahdi.conf to specify which is the peer tapping span for the current span. If you set “tappingpeerpos=no” or any other value for that matter, tapping will be disabled for that span (and then will be a regular active span).

I have the code already in 3 branches. One branch for libpri and 2 branches for Asterisk, one based on trunk and the other in Asterisk 1.6.2, keep in mind that the one from 1.6.2 has a slightly different configuration at this point, the parameter to enable tapping is “passive=yes”, this does not let you specify if the peer tapping device is the next or previous one, therefore assumes your tapping spans always start at an even number (0, 2, 4 etc), I will change that soon, hopefully …

http://svn.digium.com/svn/asterisk/team/moy/dahdi-tap-trunk
http://svn.digium.com/svn/asterisk/team/moy/dahdi-tap-1.6.2
http://svn.digium.com/svn/libpri/team/moy/tap-1.4

Now everytime a new call is detected you will receive a call that you can send wherever you want :)

Enjoy!

New Project – Sangoma Bridge

Wednesday, September 9th, 2009

A couple of months ago I wrote a little application for Regulus Labs. The application is a simple daemon bridge between Sangoma E1 devices receving ISDN PRI calls and a TCP IP server. Everything received on the telephony side was simply bridged to the configured TCP IP server. The bridge supports PRI voice calls and V.110 data calls.

Even when the application is simple in nature, learning about V.110 to get it to work was interesting :)

Today I made the project public ( thanks to Tzury Bar Yochay from Regulus Labs) in google code:

http://code.google.com/p/sbridge/

Hopefully somebody else will find it useful.