Archive for the ‘linux’ Category

I hate SELinux

Thursday, December 17th, 2009

I am not a security-savvy person, even though I know pretty well how to code defensively to avoid security issues in my C code, my security knowledge in a Linux system is pretty average (use firewall, do not run services as root etc). No wonder I really don’t know how to use or configure SELinux. But there is one thing I know about it. It can be a pain in the ass sometimes. This blog post is to describe 2 problems I have faced with SELinux. The easy solution would have been to disable SELinux. So I’ll start by showing you how to disable it in case you don’t want to mess with it at all.

- To disable SELinux. Edit /etc/selinux/config and change the SELINUX policy from enforcing to

SELINUX=disabled

Be aware that you are disabling security “features”, whatever that means. So you may want to read this other article about disabling SELinux.

I wasn’t lucky enough to have SELinux disabled as an option. Developing with SELinux enabled is a good idea so you can notice compatibility problems with SELinux in your development environment before a customer that really *needs* SELinux enabled discovers them in your software, which, from the customer point of view is a plain bug.

The first problem I noticed after some CentOS upgrade, change the hard-drive or something along those lines, was that the command “ping” wasn’t working, it’s been a while since I had the problem so I don’t quite remember the exact error when pinging other systems, but it was most likely something like permission denied. Probably other network commands did not work either, but I could just notice ping at that moment. So, I used the good old strace to find out what was causing the problem.

The underlying ping problem was because the open() system call was failing with EACCESS when trying to open /etc/hosts. However I was able to “cat /etc/hosts”. So it wasn’t a simple permission problem, but a bit more complex SELinux problem. Eventually I found out that the solution was:

restorecon reset /etc/hosts

At which point the SELinux security context for this file got screwed up? I certainly don’t know. But that command restored it. The Z option of the ls command will show you the SELinux security context for any file.

ls -Z /etc/hosts

The second problem was that some libraries of one of the programs I am responsible for were not being loaded. Again, the problem was due to permission denied errors, this time when loading the shared libraries that required text relocation.

The solution was to recompile the shared libraries with the -fPIC option.

I am sure SELinux has its uses, however I have the feeling that sometimes makes things more complicated than needed in some environments. I recommend reading this blog post and particularly the comments in there.

Quick tip for debugging deadlocks

Sunday, September 27th, 2009

If you ever find yourself with a deadlock in your application, you can use gdb to attach to the application, then sometimes you find one of the threads that is stuck trying to lock mutex x. Then you need to find out who is currently holding x and therefore deadlocking your thread (and equally important why the other thread is not releasing it).

At least on recent libc implementations in Linux, the mutex object seems to have a member named “__owner”. Let me show you what I recently saw when debugging a deadlocked application.

(gdb) f 4
#4  0x0805ab46 in ACE_OS::mutex_lock (m=0xa074248) at include/ace/OS.i:1406
1406      ACE_OSCALL_RETURN (ACE_ADAPT_RETVAL (pthread_mutex_lock (m), ace_result_),
(gdb) p *m
$8 = {__data = {__lock = 2, __count = 0, __owner = 17828, __kind = 0, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
  __size = "\002\000\000\000\000\000\000\000�E\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}

We can see that the __owner is 17828. This number is the LWP (Light-weight process) id of the thread holding the lock. Now you can go to examine that thread stack and find out why that thread is also stuck.

This example also brings up a regular point of confusion for some Linux application developers. What is the difference between LWP and POSIX thread id ( the pthread_t type in pthread.h)?. The difference is that pthread_t is a user space concept, is simply an identifier for the thread library implementing POSIX threads to refer to the thread and its resources, state etc. However the LWP is an implementation detail of how the Linux kernel implements threads, which is done through the “thread group” concept and LWP’s, that are processes that share memory pages and other resources with the other processes in the same thread group.

From the Linux kernel point of view the pthread_t value doesn’t mean anything, the LWP id is how you identify threads in the kernel, and they share the same numbering as regular processes, since LWPs are just a special type of process. Knowing this is useful when using utilities like strace. When you want to trace a particular thread of a multi threaded application, you need to provide the LWP of the thread you want to trace, a common mistake is to provide the process id, which in a multithreaded application the process id is just the LWP of the first thread in the application (the one that started executing main()).

Here is how you get each identifier in a C program:

#include <stdio.h>
#include <syscall.h>
#include <pthread.h>
 
int main()
{
  pthread_t tid = pthread_self();
  int sid = syscall(SYS_gettid);
  printf("LWP id is %d\n", sid);
  printf("POSIX thread id is %d\n", tid);
  return 0;
}

It’s important to note that getting the POSIX thread id is much faster than the LWP, because pthread_self() is just a library call and libc most likely has this value cached somewhere in user space, no need to go down to the kernel. As you can see, getting the LWP requires a call to the syscall() function, which effectively executes the requested system call, this is expensive (well, compared with the time required to enter a simple user space function).

New Project – Sangoma Bridge

Wednesday, September 9th, 2009

A couple of months ago I wrote a little application for Regulus Labs. The application is a simple daemon bridge between Sangoma E1 devices receving ISDN PRI calls and a TCP IP server. Everything received on the telephony side was simply bridged to the configured TCP IP server. The bridge supports PRI voice calls and V.110 data calls.

Even when the application is simple in nature, learning about V.110 to get it to work was interesting :)

Today I made the project public ( thanks to Tzury Bar Yochay from Regulus Labs) in google code:

http://code.google.com/p/sbridge/

Hopefully somebody else will find it useful.

Debugging information in separate files

Monday, August 31st, 2009

Debugging information in Linux ELF binaries is usually stored in the binary itself. This had been really convenient to me, for example, I always compile my openr2 library with -ggdb3 -O0. I don’t care about optimizations nor the increase in size in the binary and users can always change those flags using CFLAGS when configuring openr2. Is convenient because if my users ever get a core dump, I was able to jump right in and get a useful backtrace and examine the stack. Alternatively they could get the stack trace themselves and send it to me without worrying about anything else than launching gdb with the right arguments.

However, when you ship non-open source software or you’re just concerned with the size of all the debugging information in lots of libraries, you want to separate the debugging information from the binary holding the program/library itself. In Windows this is the default behavior you get with the well known PDB (Program Data Base) files. For Linux though, you need some tricks to get the debugging information separate. This is of course what most distributions do, they include an extra package with debugging information, so when you install a package you get just the binary code, then, if you need to debug it you download the debugging package.

If you ever need this, you can follow the instructions in this web page to get it to work:

http://sources.redhat.com/gdb/current/onlinedocs/gdb_17.html#SEC166

The way I solved it for our internal build system is just to always compile with -ggdb3 and then:

1. Create a copy of the debugging symbols in a separate binary

objcopy --only-keep-debug somelibrary.so somelibrary.so.dbg

2. Remove the debugging information from the code binary.

objcopy --strip-debug somelibrary.so

3. Add a reference to the code binary so gdb knows where to look for the debugging information

objcopy --add-gnu-debuglink somelibrary.so.dbg somelibrary.so

This last step is simply putting a file name reference inside the ELF binary so GDB (or some other debugger) knows which file name will have the debugging information for this .so (or an executable if that’s what you’re building). In the red hat web page more advanced techniques are explained to make sure you don’t end up with a version mismatch between the debugging information and your library or executable.

GNU autotools tip

Monday, August 31st, 2009

When you create a program or library for Linux you may need GNU auto tools (automake, libtool, configure etc) to detect environment settings. These tools may become a pain in the ass when you start with them (and probably later too). Something I recommend that has worked for me is to create a bootstrap.sh script like this:

#!/bin/bash
autoheader
libtoolize --force --copy
aclocal
automake -f --copy --add-missing
autoconf

The –force –copy for libtoolize and -f –copy –add-missing for automake will help you to not depend on symbolic links that are created by libtoolize and automake and that may not be present on the target machine where your code will be built.

I suppose there is a valid reason to not use those options, but for me, that saved me a lot of hassle.

OpenR2 and OpenZap now integrated – MFCR2 support for FreeSWITCH

Friday, August 21st, 2009

After putting this off by several weeks, I finally spent some quality time working in getting to work OpenZAP with OpenR2. The result is now available in the openzap project svn trunk:

http://svn.openzap.org/svn/openzap/trunk/

I also created some basic documentation about how to set it up: http://wiki.freeswitch.org/wiki/OpenZAP_OpenR2

This means that from now on FreeSWITCH will support MFC-R2 signaling with the same stack that Asterisk is using since 1.6.2

I still need to do some work on the documentation and lots of stress testing, but you can start playing with it and bugging me if it does not work :-)

Why does Asterisk consume 100% CPU?

Wednesday, May 6th, 2009

I don’t know :)

But people has asked me this a couple of times lately and my answer is always “I don’t know”. However ps can give you more information about it. In fact, this works for any application you have and you want to debug why is going crazy.

First, check which thread (Asterisk is a multi threaded application) is going crazy.

# ps -LlFm -p `pidof asterisk`

That should show you the % of CPU being used by each Asterisk thread in the column named “C”, then write down the LWP colum value for the thread you are interested on. (LWP is a light weight process number, roughly speaking, the thread id). Now that you have the thread id, you need to know what that thread is doing.

# pstack `pidof asterisk` > /tmp/asterisk.stack.txt

That will cause the asterisk process to dump the stack state to the /tmp/asterisk.stack.txt file. If you don’t have the pstack command google for it, I think in CentOS is as easy as yum install pstack.

Then open the file and search for the LWP that you just wrote down. Hopefully you will find some hints that let you know how to avoid it or at least a lot more information to post in bugs.digium.com

UPDATE:
One of the guys who asked this question later told me what he found:

Thread 10 (Thread 0×41d8f940 (LWP 3406)):
#0 0×00000033ce2ca436 in poll () from /lib64/libc.so.6
#1 0×00000000004933c0 in ast_io_wait ()
#2 0×00002aaabd9510cd in network_thread ()
#3 0×00000000004f8b2c in dummy_start ()
#4 0×00000033cee06367 in start_thread () from /lib64/libpthread.so.0
#5 0×00000033ce2d2f7d in clone () from /lib64/libc.so.6

A quick grep -rI “network_thread” in the Asterisk source code reveals this function belongs to chan_iax.c, disabling chan_iax.so in modules.conf is a good workaround to his problem, however further debugging would be needed to determine why the monitor thread is looping like that.

G729A and G723.1 support for FreeSwitch

Sunday, February 8th, 2009

The past weekend I spent some time writing a module for the FreeSwitch project in order to support the G729A codec in it. This codec is patent encumbered, however, my target was not to do this in software, but just create the software interface for FreeSwitch to talk with a PCI card manufactured by Digium that does the transcoding for G729A and G723.1 .

This is the data sheet for the TC400B board. The programming interfaces to access the encoders and decoders is not documented (or at least I could not find any documentation), but it’s enough to have available the source code for the module in Asterisk that uses that very same board.

This board expose its available encoders to the DAHDI / Zaptel core driver, which in turn exposes all transcoders registered by the boards through the Linux filesystem in /dev/dahdi/transcode or /dev/zap/transcode, depending on whether you have Zaptel or DAHDI drivers.

Here I want to explain the few interfaces required to use this board.

The first thing you usually want to do is verify that there is encoders and decoders available. This is done through an ioctl to request the information about the availability of these transcoders.

        struct dahdi_transcoder_info info = {0};

        fd = open("/dev/dahdi/transcode", O_RDWR);
        if (fd < 0) {

                fprintf(stderr, "Failed to open dahdi transcode device\n");
                exit(1);
        }
        for (info.tcnum = 0; !(res = ioctl(fd, DAHDI_TC_GETINFO, &info)); info.tcnum++) {

                printf("Found transcoder '%s', numchannels = %d, dstfmts = %d, srcfmts = %d.\n", info.name,
                                info.numchannels, info.dstfmts, info.srcfmts);
        }

        close(fd);

The driver will let you know the number of encoders, decoders and the source and destiny formats. The formats masks can be found in /usr/include/dahdi/kernel.h. In future versions that may change, I discussed this with one of the DAHDI developers, since I think that this should be in dahdi/user.h and not in kernel.h given that is a user space interface.

Once you know which transcoders are available you can request an encoder or decoder, or both.

        int encoder_fd, decoder_fd;

        struct dahdi_transcoder_formats g729_encoder;
        struct dahdi_transcoder_formats g729_decoder;

        g729_encoder.srcfmt = DAHDI_FORMAT_ULAW;

        g729_encoder.dstfmt = DAHDI_FORMAT_G729A;

        g729_decoder.srcfmt = DAHDI_FORMAT_G729A;

        g729_decoder.dstfmt = DAHDI_FORMAT_ULAW;

        encoder_fd = open("/dev/dahdi/transcode", O_RDWR);

        if (encoder_fd < 0) {
                printf("Failed to open transcode device\n");
                exit(1);
        }
        if (ioctl(fd, DAHDI_TC_ALLOCATE, &g729_encoder)) {

                printf("Failed to allocate encoder\n");
                close(encoder_fd);
                exit(1);
        }

        decoder_fd = open("/dev/dahdi/transcode", O_RDWR);
        if (decoder_fd < 0) {

                printf("Failed to open transcode device\n");
                exit(1);
        }
        if (ioctl(fd, DAHDI_TC_ALLOCATE, &g729_decoder)) {

                printf("Failed to allocate decoder\n");
                close(fd);
                exit(1);
        }

Finally, once allocated, you just have to write chunks of ulaw data and read chunks of decoded g729 data and viceversa. You can choose whether the device will accept ulaw or alaw to g729 or g723 manipulating the srcfmt and dstfmt members of dahdi_transcoder_formats. Just remember that ulaw or alaw encoded data requires 8 times more bytes than g729 encoded data, therefore if you write a frame of 20ms of alaw (160 bytes for a sampling rate of 8000hz) you will read just 20 bytes of g729 encoded data, of course, the same apply when you decode a g729 frame or for g723 (for which alaw and ulaw requires 12 times more space).

The module is available here under the MPL license. Also, the module was just commited to trunk yesterday in the Freeswitch SVN repository. I want to thank to Voiceway for sponsoring the module and Neocenter for providing the hardware to test it.

A Tale of Two Bugs

Sunday, May 25th, 2008

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the age of bug-hunting!

Recently I fixed 2 bugs, yeah, I know I spent a lot of time fixing bugs but this 2 were quite interesting to me, not because of the bugs itself, but rather because of some stuff I learned in the process like the implementation of variadic functions and how the C++ compiler optimizes certain stuff unveiling odd bugs.

Bug 1

Let’s analyze the first one, it was a bug I had with some Unicall R2 installation in 64 bits. The problem was simple, as soon as I loaded chan_unicall.so Asterisk crashed :-)

After running Asterisk with gdb I found the crash happened inside libc function strlen that was being called by uc_log(), the Unicall logging function. As most logging C functions, uc_log is a variadic function. uc_log does not do any complicated stuff, is mostly just a wrapper to vsnprintf and the variable arguments were just passed on to vsnprintf and there the crash was occurring, so, how can one see the arguments a variadic function receives using gdb? First, one must know how variadic functions are implemented by the compiler and platform you are working on.

Most common implementation of variadic functions in C is just to define va_list as an unsigned char* pointing to the last argument of the function and each call to va_arg() retrieves the next chunk of memory of the specified size and increment va_list to point to the start of the next argument, therefore, displaying arguments is just matter of printing the memory area after the last argument. However, AMD64 has a different implementation, va_list is an array of 1 structure with members:

.gp_offset
.fp_offset
.overflow_arg_area
.reg_save_area

gp_offset is how many bytes after reg_save_area the first argument is. To print the first variable argument that we know is an “int” we do:

(gdb) p *(int *)(((char *)arg_ptr[0].reg_save_area)+arg_ptr[0].gp_offset)

however, gp_offset will be only incremented after calling va_arg() macro, if you want to see more arguments you must increment reg_save_area by the number of bytes you know arguments take, in the case of uc_log, initial value of gp_offset is 24, probably because it receive 3 fixed arguments (8 bytes * 3). So, the first variable argument starts at .reg_save_area + 24, the second at .reg_save_area + 32 (we’re in a 64 bit machine).

So, what about .fp_offset and .overflow_arg_area?, well, it seems .reg_save_area is quite limited (possibly limited by the number of the processor registers) and you can never go beyond .gp_offset == 40, therefore that will only work for up to 6 arguments (including the fixed ones). .overflow_arg_area is used for any subsequent argument and .fp_offset is the pointer to the next argument on that memory area. Well, that’s enough, let’s get straight to the point, the crash was caused because unicall.h include the following prototype:

extern const char *uc_statet2str(int state);

That function returned the value passed to uc_log(…., uc_state2str()) … so what’s the issue? well, read once again the prototype and how uc_log used it. Is not a typo here in my blog, the prototype really is uc_statet2str, and the function call is uc_state2str, indeed there is a typo in the header file causing the compiler to default to the return value “int” and not const char* when compiling libmfcr2, for 64 bit platform there is 4 bytes of difference between char* and int causing a crash due to invalid memory read.

Bug 2

This one is easier to explain with a chunk of code, can you tell what’s wrong with it and what possible outputs will have when running it as “./test t”?


#include <stdio.h>
#include <string.h>

#define SIZE 100

int main(int argc, char *argv[])
{

        char *bufptr = NULL;
        if (argc == 2) {

                char inblock_buff[SIZE];
                bufptr = inblock_buff;
                strcpy(bufptr, "some buffer");
        }

        printf("buffer: %s\n", bufptr);
        if (argc == 2 && argv[1][0] == 't') {

                char otherbuff[SIZE];
                otherbuff[0] = 0;
        }

        printf("buffer: %s\n", bufptr);
        return 0;
}

Indeed, the output will depend on how you compile it and even probably will depend on the compiler implementation? The thing is, that if you compile this code in Linux with gcc 4.1.2 as gcc -O2 bug.c -o bug, and then run it as ./bug t

The output is:

buffer: some buffer
buffer: some buffer

But, compiling without optimizations gcc -O0 bug.c -o bug the output is:

buffer: some buffer
buffer:

When the second if() block is optimized-out the value of the block variable inblock_buff is not overwritten and therefore bufptr remains pointing to “some buffer” and the code seems to “work”, but when -O0 the second if() block is not optimized and the bug arise, bufptr will point to char 0 printing nothing. In my particular case this buffer was the input of the keyboard of a 5250 session, hence, in some cases the keyboard input was just ignored.

Asterisk with MFC/R2 in chan_zap

Friday, April 25th, 2008

I’ve been working lately in a library for the MFC/R2 telephony signalling. I named this library “OpenR2″. My goal is to include support for this signaling in the Asterisk project and eventually in FreeSwitch if possible. I just created a new issue in the bugtracker, that is the first patch I have to give MFC/R2 support in the Asterisk channel driver chan_zap. Hopefully this will eventually be the standard MFC/R2 implementation for Asterisk and finally it will “just work”.

If you are from the many people with R2 issues in Mexico or other country, you should consider to give this thing a try.

The code of the library is LGPL and is available temporarily to download from http://www.moythreads.com/openr2-april21.tar.gz.

I am in the process of getting an SVN account and will post later the link.

Patch for chan_zap: http://www.moythreads.com/chan_zap-mfr2.patch

All you have to do is download Asterisk from this branch: http://svn.digium.com/svn/asterisk/team/markster/mfr2

Then apply the patch. You will also need zaptel from this branch: http://svn.digium.com/svn/zaptel/branches/1.4

After applying the patch, please run ./bootstrap.sh in the Asterisk root directory. Then ./configure –prefix=/usr –with-openr2=/usr

Please report feedback in the bugtracker or contact me via e-mail or IM. My user at gmail it’s quite easy to remember: moises.silva