Archive for the ‘GNU/Linux General’ Category

GNU autotools tip

Monday, August 31st, 2009

When you create a program or library for Linux you may need GNU auto tools (automake, libtool, configure etc) to detect environment settings. These tools may become a pain in the ass when you start with them (and probably later too). Something I recommend that has worked for me is to create a bootstrap.sh script like this:

#!/bin/bash
autoheader
libtoolize --force --copy
aclocal
automake -f --copy --add-missing
autoconf

The –force –copy for libtoolize and -f –copy –add-missing for automake will help you to not depend on symbolic links that are created by libtoolize and automake and that may not be present on the target machine where your code will be built.

I suppose there is a valid reason to not use those options, but for me, that saved me a lot of hassle.

OpenR2 and OpenZap now integrated – MFCR2 support for FreeSWITCH

Friday, August 21st, 2009

After putting this off by several weeks, I finally spent some quality time working in getting to work OpenZAP with OpenR2. The result is now available in the openzap project svn trunk:

http://svn.openzap.org/svn/openzap/trunk/

I also created some basic documentation about how to set it up: http://wiki.freeswitch.org/wiki/OpenZAP_OpenR2

This means that from now on FreeSWITCH will support MFC-R2 signaling with the same stack that Asterisk is using since 1.6.2

I still need to do some work on the documentation and lots of stress testing, but you can start playing with it and bugging me if it does not work :-)

Why does Asterisk consume 100% CPU?

Wednesday, May 6th, 2009

I don’t know :)

But people has asked me this a couple of times lately and my answer is always “I don’t know”. However ps can give you more information about it. In fact, this works for any application you have and you want to debug why is going crazy.

First, check which thread (Asterisk is a multi threaded application) is going crazy.

# ps -LlFm -p `pidof asterisk`

That should show you the % of CPU being used by each Asterisk thread in the column named “C”, then write down the LWP colum value for the thread you are interested on. (LWP is a light weight process number, roughly speaking, the thread id). Now that you have the thread id, you need to know what that thread is doing.

# pstack `pidof asterisk` > /tmp/asterisk.stack.txt

That will cause the asterisk process to dump the stack state to the /tmp/asterisk.stack.txt file. If you don’t have the pstack command google for it, I think in CentOS is as easy as yum install pstack.

Then open the file and search for the LWP that you just wrote down. Hopefully you will find some hints that let you know how to avoid it or at least a lot more information to post in bugs.digium.com

UPDATE:
One of the guys who asked this question later told me what he found:

Thread 10 (Thread 0x41d8f940 (LWP 3406)):
#0 0x00000033ce2ca436 in poll () from /lib64/libc.so.6
#1 0x00000000004933c0 in ast_io_wait ()
#2 0x00002aaabd9510cd in network_thread ()
#3 0x00000000004f8b2c in dummy_start ()
#4 0x00000033cee06367 in start_thread () from /lib64/libpthread.so.0
#5 0x00000033ce2d2f7d in clone () from /lib64/libc.so.6

A quick grep -rI “network_thread” in the Asterisk source code reveals this function belongs to chan_iax.c, disabling chan_iax.so in modules.conf is a good workaround to his problem, however further debugging would be needed to determine why the monitor thread is looping like that.

G729A and G723.1 support for FreeSwitch

Sunday, February 8th, 2009

The past weekend I spent some time writing a module for the FreeSwitch project in order to support the G729A codec in it. This codec is patent encumbered, however, my target was not to do this in software, but just create the software interface for FreeSwitch to talk with a PCI card manufactured by Digium that does the transcoding for G729A and G723.1 .

This is the data sheet for the TC400B board. The programming interfaces to access the encoders and decoders is not documented (or at least I could not find any documentation), but it’s enough to have available the source code for the module in Asterisk that uses that very same board.

This board expose its available encoders to the DAHDI / Zaptel core driver, which in turn exposes all transcoders registered by the boards through the Linux filesystem in /dev/dahdi/transcode or /dev/zap/transcode, depending on whether you have Zaptel or DAHDI drivers.

Here I want to explain the few interfaces required to use this board.

The first thing you usually want to do is verify that there is encoders and decoders available. This is done through an ioctl to request the information about the availability of these transcoders.

        struct dahdi_transcoder_info info = {0};

        fd = open("/dev/dahdi/transcode", O_RDWR);
        if (fd < 0) {

                fprintf(stderr, "Failed to open dahdi transcode device\n");
                exit(1);
        }
        for (info.tcnum = 0; !(res = ioctl(fd, DAHDI_TC_GETINFO, &info)); info.tcnum++) {

                printf("Found transcoder '%s', numchannels = %d, dstfmts = %d, srcfmts = %d.\n", info.name,
                                info.numchannels, info.dstfmts, info.srcfmts);
        }

        close(fd);

The driver will let you know the number of encoders, decoders and the source and destiny formats. The formats masks can be found in /usr/include/dahdi/kernel.h. In future versions that may change, I discussed this with one of the DAHDI developers, since I think that this should be in dahdi/user.h and not in kernel.h given that is a user space interface.

Once you know which transcoders are available you can request an encoder or decoder, or both.

        int encoder_fd, decoder_fd;

        struct dahdi_transcoder_formats g729_encoder;
        struct dahdi_transcoder_formats g729_decoder;

        g729_encoder.srcfmt = DAHDI_FORMAT_ULAW;

        g729_encoder.dstfmt = DAHDI_FORMAT_G729A;

        g729_decoder.srcfmt = DAHDI_FORMAT_G729A;

        g729_decoder.dstfmt = DAHDI_FORMAT_ULAW;

        encoder_fd = open("/dev/dahdi/transcode", O_RDWR);

        if (encoder_fd < 0) {
                printf("Failed to open transcode device\n");
                exit(1);
        }
        if (ioctl(fd, DAHDI_TC_ALLOCATE, &g729_encoder)) {

                printf("Failed to allocate encoder\n");
                close(encoder_fd);
                exit(1);
        }

        decoder_fd = open("/dev/dahdi/transcode", O_RDWR);
        if (decoder_fd < 0) {

                printf("Failed to open transcode device\n");
                exit(1);
        }
        if (ioctl(fd, DAHDI_TC_ALLOCATE, &g729_decoder)) {

                printf("Failed to allocate decoder\n");
                close(fd);
                exit(1);
        }

Finally, once allocated, you just have to write chunks of ulaw data and read chunks of decoded g729 data and viceversa. You can choose whether the device will accept ulaw or alaw to g729 or g723 manipulating the srcfmt and dstfmt members of dahdi_transcoder_formats. Just remember that ulaw or alaw encoded data requires 8 times more bytes than g729 encoded data, therefore if you write a frame of 20ms of alaw (160 bytes for a sampling rate of 8000hz) you will read just 20 bytes of g729 encoded data, of course, the same apply when you decode a g729 frame or for g723 (for which alaw and ulaw requires 12 times more space).

The module is available here under the MPL license. Also, the module was just commited to trunk yesterday in the Freeswitch SVN repository. I want to thank to Voiceway for sponsoring the module and Neocenter for providing the hardware to test it.

A Tale of Two Bugs

Sunday, May 25th, 2008

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the age of bug-hunting!

Recently I fixed 2 bugs, yeah, I know I spent a lot of time fixing bugs but this 2 were quite interesting to me, not because of the bugs itself, but rather because of some stuff I learned in the process like the implementation of variadic functions and how the C++ compiler optimizes certain stuff unveiling odd bugs.

Bug 1

Let’s analyze the first one, it was a bug I had with some Unicall R2 installation in 64 bits. The problem was simple, as soon as I loaded chan_unicall.so Asterisk crashed :-)

After running Asterisk with gdb I found the crash happened inside libc function strlen that was being called by uc_log(), the Unicall logging function. As most logging C functions, uc_log is a variadic function. uc_log does not do any complicated stuff, is mostly just a wrapper to vsnprintf and the variable arguments were just passed on to vsnprintf and there the crash was occurring, so, how can one see the arguments a variadic function receives using gdb? First, one must know how variadic functions are implemented by the compiler and platform you are working on.

Most common implementation of variadic functions in C is just to define va_list as an unsigned char* pointing to the last argument of the function and each call to va_arg() retrieves the next chunk of memory of the specified size and increment va_list to point to the start of the next argument, therefore, displaying arguments is just matter of printing the memory area after the last argument. However, AMD64 has a different implementation, va_list is an array of 1 structure with members:

.gp_offset
.fp_offset
.overflow_arg_area
.reg_save_area

gp_offset is how many bytes after reg_save_area the first argument is. To print the first variable argument that we know is an “int” we do:

(gdb) p *(int *)(((char *)arg_ptr[0].reg_save_area)+arg_ptr[0].gp_offset)

however, gp_offset will be only incremented after calling va_arg() macro, if you want to see more arguments you must increment reg_save_area by the number of bytes you know arguments take, in the case of uc_log, initial value of gp_offset is 24, probably because it receive 3 fixed arguments (8 bytes * 3). So, the first variable argument starts at .reg_save_area + 24, the second at .reg_save_area + 32 (we’re in a 64 bit machine).

So, what about .fp_offset and .overflow_arg_area?, well, it seems .reg_save_area is quite limited (possibly limited by the number of the processor registers) and you can never go beyond .gp_offset == 40, therefore that will only work for up to 6 arguments (including the fixed ones). .overflow_arg_area is used for any subsequent argument and .fp_offset is the pointer to the next argument on that memory area. Well, that’s enough, let’s get straight to the point, the crash was caused because unicall.h include the following prototype:

extern const char *uc_statet2str(int state);

That function returned the value passed to uc_log(…., uc_state2str()) … so what’s the issue? well, read once again the prototype and how uc_log used it. Is not a typo here in my blog, the prototype really is uc_statet2str, and the function call is uc_state2str, indeed there is a typo in the header file causing the compiler to default to the return value “int” and not const char* when compiling libmfcr2, for 64 bit platform there is 4 bytes of difference between char* and int causing a crash due to invalid memory read.

Bug 2

This one is easier to explain with a chunk of code, can you tell what’s wrong with it and what possible outputs will have when running it as “./test t”?


#include <stdio.h>
#include <string.h>

#define SIZE 100

int main(int argc, char *argv[])
{

        char *bufptr = NULL;
        if (argc == 2) {

                char inblock_buff[SIZE];
                bufptr = inblock_buff;
                strcpy(bufptr, "some buffer");
        }

        printf("buffer: %s\n", bufptr);
        if (argc == 2 && argv[1][0] == 't') {

                char otherbuff[SIZE];
                otherbuff[0] = 0;
        }

        printf("buffer: %s\n", bufptr);
        return 0;
}

Indeed, the output will depend on how you compile it and even probably will depend on the compiler implementation? The thing is, that if you compile this code in Linux with gcc 4.1.2 as gcc -O2 bug.c -o bug, and then run it as ./bug t

The output is:

buffer: some buffer
buffer: some buffer

But, compiling without optimizations gcc -O0 bug.c -o bug the output is:

buffer: some buffer
buffer:

When the second if() block is optimized-out the value of the block variable inblock_buff is not overwritten and therefore bufptr remains pointing to “some buffer” and the code seems to “work”, but when -O0 the second if() block is not optimized and the bug arise, bufptr will point to char 0 printing nothing. In my particular case this buffer was the input of the keyboard of a 5250 session, hence, in some cases the keyboard input was just ignored.

Asterisk with MFC/R2 in chan_zap

Friday, April 25th, 2008

I’ve been working lately in a library for the MFC/R2 telephony signalling. I named this library “OpenR2″. My goal is to include support for this signaling in the Asterisk project and eventually in FreeSwitch if possible. I just created a new issue in the bugtracker, that is the first patch I have to give MFC/R2 support in the Asterisk channel driver chan_zap. Hopefully this will eventually be the standard MFC/R2 implementation for Asterisk and finally it will “just work”.

If you are from the many people with R2 issues in Mexico or other country, you should consider to give this thing a try.

The code of the library is LGPL and is available temporarily to download from http://www.moythreads.com/openr2-april21.tar.gz.

I am in the process of getting an SVN account and will post later the link.

Patch for chan_zap: http://www.moythreads.com/chan_zap-mfr2.patch

All you have to do is download Asterisk from this branch: http://svn.digium.com/svn/asterisk/team/markster/mfr2

Then apply the patch. You will also need zaptel from this branch: http://svn.digium.com/svn/zaptel/branches/1.4

After applying the patch, please run ./bootstrap.sh in the Asterisk root directory. Then ./configure –prefix=/usr –with-openr2=/usr

Please report feedback in the bugtracker or contact me via e-mail or IM. My user at gmail it’s quite easy to remember: moises.silva

ViEmu Rocks

Wednesday, March 26th, 2008

Since I joined IBM a bit more than a year ago, I knew I would have to code for Windows sooner or later, it was sooner than I thought, it has been a year now since I started coding for both Linux & Windows. Coding C++ in Visual Studio is a pain, compared to using VIM and GDB in Linux, however, I just found a Visual Studio plugin that is more than worth the 80 bucks I paid: http://www.viemu.com/

If you are a fan of vim as I am, then you also will enjoy reading: http://www.viemu.com/a-why-vi-vim.html, that’s where I first saw the Vi Gang sign!

Vi Gang Sign

PInvoke ( How to Call C from C# )

Monday, February 4th, 2008

I will be coding some C# stuff for Windows this year. We have a bunch of C/C++ APIs which we want to make available to our customers from C#, however, since I have known of the Mono existence for a while, I quickly realized that 90% the C# interfaces I will code on Windows for our C/C++ APIs can be made available for our Linux product as well. This post will briefly show how to call C/C++ code from C#.

As most of the readers probably know, C# is one of the primary languages of the .NET platform, thus, runs in a managed environment and cannot call unmanaged code w/o some intermediate mechanism, but don’t fear, it is quite easy, that mechanism is PInvoke, that stands for Platform Invoke. Let’s see an example of how it is done, on Linux.

1. Create a C file, libtest.c with this content:


#include <stdio.h>

void print(const char *message)
{

        printf("%s\\n", message);
}

That’s a simple pseudo-wrapper for printf. But represents any C function in the library you want to call. If you have a C++ function don’t forget to put extern “C” to avoid mangling the name.

2. Compile it as a shared library: gcc -fPIC -shared libtest.c -o libtest.so

3. Let’s create the C# file that will call our C API. It’s quite easy using PInvoke, all we need is define our entry points.

using System;

using System.Runtime.InteropServices;

public class Tester
{
        [DllImport("libtest.so", EntryPoint="print")]

        static extern void print(string message);

        public static void Main(string[] args)
        {

                print("Hello World C# => C++");
        }
}

We define a test class that declares a method “print”. However we do not write the body of the method since we declare it extern and static because it will not depend on the class instance data. Using C# attribute DllImport we specify the DLL ( in Linux, the shared object ) where the method will be defined.

4. Compile the C# file: mcs test.cs

5. Run it: mono test.exe

Unless you have the library libtest.so in a standard library path like “/usr/lib”, you are likely to see a System.DllNotFoundException, to fix this you can move your libtest.so to /usr/lib, or better yet, just add your CWD to the library path: export LD_LIBRARY_PATH=`pwd`

6. Run it again :) … now you should see the hello world C# => C++ message.

The DllImport attribute supports other arguments to tweak its behavior, refer to MSDN for DllImport documentation. Also keep in mind this is a extremely simple example, when more complex data types are involved we need to read about marshaling.

So that’s it, in my next post I will write about how to call C# code from C/C++ using COM Interop and managed C++.

SHMConfig

Tuesday, October 23rd, 2007

Well, found the user space configuration to instruct X to grab or not grab the Synaptics touchpad. SHMConfig = off did the trick :)

Linux Input Subsystem and my TouchPad

Sunday, October 21st, 2007

Heck, I hate when things stop working suddenly without apparent reason ( and who doesn’t ? ). But this time I have learned some stuff and finding a workaround has been fun. Some weeks ago my touchpad just stopped working as soon as I logged into Gnome session. The touchpad even worked at the login screen, but it stopped working as soon as I typed my password and the login process started. To add some oddity to this problem, if I typed “CTRL + ALT + F1″ to get a console, my touchpad worked again!, but when I typed “CTRL + ALT + F7″ to get back to my gnome session it stopped working again, WTF???. Fortunately the trackpoint still worked at all times, so, even when I liked more the touchpad I got used to the trackpoint quickly and even I felt it had some advantages. However, I was still puzzled because once in a while I wanted to scroll down without having to press the down arrow in my keyboard. So, as soon as I had some time, I started investigating the issue.

The first thing I did was open the xorg.conf configuration and X logs, probably some IBM Open Client ( IBM Red Hat based distro ) update broke my config or something like that. Did not find errors in the logs but I found that the mouse input device was configured as “/dev/input/mice”, as usual. After googling a couple of minutes I got tired of searching. So I decided to do more low level investigation instead of just blindly tweaking X configuration file and end user stuff like that. I did a small test program a bit like this:


fd = open("/dev/input/mice", O_RDONLY);

FD_ZERO(&rfds);
FD_SET(fd);
while ( 1 ) {
    res = select(fd + 1, &rfds, NULL, NULL, NULL);

    if ( FD_ISSET(fd, &rfds) ) {
        res = read(fd, read_buffer, sizeof(read_buffer));

        printf("got some data\\n");
    }
}

So, this just served the purpose of confirming no event was generated when I moved my finger in the touchpad, and events were generated with the trackpoint. So, was this a driver problem? I did a quick ‘ls -la’ to find out the major and minor of “/dev/input/mice”, however I did not find a way to map the major/minor to the kernel driver that takes care of the file operations ( read(), write(), blah() ). I asked several people and nobody knew how to find out the driver/kernel module that takes care of creating a /dev/ entry given the major and minor. Obviously the kernel can, but I did not find any user space tool for that. In order to try to find the driver I downloaded the kernel sources that closely matched my installed kernel ( 2.6.18 ). After some searching in the kernel tree I found “drivers/input/” directory. I was sure somewhere there it was the answer to my problem. Since every time I entered the graphics mode touchpad stopped working, I thought X was doing some nasty stuff, may be some ioctl() to disable the touchpad if trackpoint was found as well? I found some posts in forums saying some people want to disable touchpad because they accidentally touch it when writing and that disturbs them, so I started considering the possibility of X disabling intentionally my touchpad. I did a “grep” searching for ioctl definitions in the “drivers/input” directory and found several ones, particularly evdev.c was interesting, since events were not being received in user space after all. I did not understand shit of what the ioctl’s were doing there, so I searched for documentation in Documentation/input/input.txt , and voilá!, things started to make sense. I did some googling on the ioctl’s defined in evdev.c because input.txt did not say a word about them. I found this interesting articles:

http://www.linuxjournal.com/article/6396
http://www.linuxjournal.com/article/6429

Interesting, but, still did not say anything about why possibly events could not be received in user space. So, back to the code in evdev.c I paid more attention at the ioctl’s and started experimenting with some of them like EVIOCGVERSION and EVIOCGNAME. I found some documentation in include/linux/input.h . I found a comment about EVIOCGRAB saying “Grab/Release device”. Hum, that sounded interesting, what implications has grabbing a device? The articles did not mention that ioctl, so, Let’s try it!

/* fd is /dev/input/event0 ( my keyboard )*/
ioret = ioctl(fd, EVIOCGVERSION, &version);

ioret = ioctl(fd, EVIOCGNAME(sizeof(device_name)), device_name);

ioret = ioctl(fd, EVIOCGRAB, &grab);
if ( -1 == ioret ) {
  perror("ioctl()");
}

printf("ver: %d, ret = %d\n", version, ioret);
printf("device name is: %s\n", device_name);

I paid the price of my stupidity, testing with my keyboard was not a smart thing, really. My keyboard stopped working and I had to brute force reboot. So, it seems that EVIOCGRAB ioctl is something like getting exclusive rights over the input events of the specified device, thus, X was not able to get my keyboard events, the only process that was able to receive them was my program, and it was not doing anything useful with them except printing “got some data” message.

I tried next with /dev/input/event1 ( my touchpad ) and got a “Device or resource busy” and tried with /dev/input/event2 ( my trackpoint ) and it succeeded. I connected an USB mouse and it created /dev/input/event3. Tried with that one and it also worked. So, it was clear to me that someone ( X of course ) was grabbing my touchpad for some reason and ignoring events, error or feature?, I still don’t know. So, I did a “touchpad-takeover” program that will not allow X to take control of my touchpad and then, when already logged in, my program will release the device. That way events will start flowing again to any application waiting on /dev/input/event1 and /dev/input/mice ( this device is a mix of all the mouse devices events ), may be that way X will learn to play nice with me :)

Here is how the touchpad-takeover look like:

http://www.moythreads.com/htmlcode/touchpad-takeover.html

Now, everytime my laptop boots, I launch that program, with /dev/input/event1 as argument, in the background. X report an error saying “Synaptics can’t grab event device, errno=16″, aha!, how does that feels X??! hu? … As soon as I start my gnome session I send SIGINT to my program (kill -2) so it will release /dev/input/event1 and I am able to use my touchpad :)

Is not that odd? If X is able to grab my touchpad device, touchpad does not work. If I grab the device first so X cannot grab it, and then, once logged in, I release the device, I can work with my touchpad. Bug or feature? I think some X configuration might alter that behaviour, but I have not found it so far. May be with some more time in my hands I will dig into the X server code to find out what is going on. Even when I am able to use my touchpad, the scroll down feature is missing in my touchpad :( . Before this strange stuff happened, I could move my finger on the right edge of the touchpad and it had the effect of scrolling down my screen/documents, now it does not.

Long post … time to go back to Guadalajara. I arrived at Puebla yesterday to give an Asterisk talk at ENLi, I was planning to get back yesterday at night, however, Tato did not show up to give his Asterisk presentation, fortunately I talked with Sandino this morning and he said Tato is OK, so I gave the presentation this morning in his place. Gotta go, I’m quite tired …