Archive for the ‘C/C++’ Category

Linux Input Subsystem and my TouchPad

Sunday, October 21st, 2007

Heck, I hate when things stop working suddenly without apparent reason ( and who doesn’t ? ). But this time I have learned some stuff and finding a workaround has been fun. Some weeks ago my touchpad just stopped working as soon as I logged into Gnome session. The touchpad even worked at the login screen, but it stopped working as soon as I typed my password and the login process started. To add some oddity to this problem, if I typed “CTRL + ALT + F1″ to get a console, my touchpad worked again!, but when I typed “CTRL + ALT + F7″ to get back to my gnome session it stopped working again, WTF???. Fortunately the trackpoint still worked at all times, so, even when I liked more the touchpad I got used to the trackpoint quickly and even I felt it had some advantages. However, I was still puzzled because once in a while I wanted to scroll down without having to press the down arrow in my keyboard. So, as soon as I had some time, I started investigating the issue.

The first thing I did was open the xorg.conf configuration and X logs, probably some IBM Open Client ( IBM Red Hat based distro ) update broke my config or something like that. Did not find errors in the logs but I found that the mouse input device was configured as “/dev/input/mice”, as usual. After googling a couple of minutes I got tired of searching. So I decided to do more low level investigation instead of just blindly tweaking X configuration file and end user stuff like that. I did a small test program a bit like this:


fd = open("/dev/input/mice", O_RDONLY);

FD_ZERO(&rfds);
FD_SET(fd);
while ( 1 ) {
    res = select(fd + 1, &rfds, NULL, NULL, NULL);

    if ( FD_ISSET(fd, &rfds) ) {
        res = read(fd, read_buffer, sizeof(read_buffer));

        printf("got some data\\n");
    }
}

So, this just served the purpose of confirming no event was generated when I moved my finger in the touchpad, and events were generated with the trackpoint. So, was this a driver problem? I did a quick ‘ls -la’ to find out the major and minor of “/dev/input/mice”, however I did not find a way to map the major/minor to the kernel driver that takes care of the file operations ( read(), write(), blah() ). I asked several people and nobody knew how to find out the driver/kernel module that takes care of creating a /dev/ entry given the major and minor. Obviously the kernel can, but I did not find any user space tool for that. In order to try to find the driver I downloaded the kernel sources that closely matched my installed kernel ( 2.6.18 ). After some searching in the kernel tree I found “drivers/input/” directory. I was sure somewhere there it was the answer to my problem. Since every time I entered the graphics mode touchpad stopped working, I thought X was doing some nasty stuff, may be some ioctl() to disable the touchpad if trackpoint was found as well? I found some posts in forums saying some people want to disable touchpad because they accidentally touch it when writing and that disturbs them, so I started considering the possibility of X disabling intentionally my touchpad. I did a “grep” searching for ioctl definitions in the “drivers/input” directory and found several ones, particularly evdev.c was interesting, since events were not being received in user space after all. I did not understand shit of what the ioctl’s were doing there, so I searched for documentation in Documentation/input/input.txt , and voilá!, things started to make sense. I did some googling on the ioctl’s defined in evdev.c because input.txt did not say a word about them. I found this interesting articles:

http://www.linuxjournal.com/article/6396
http://www.linuxjournal.com/article/6429

Interesting, but, still did not say anything about why possibly events could not be received in user space. So, back to the code in evdev.c I paid more attention at the ioctl’s and started experimenting with some of them like EVIOCGVERSION and EVIOCGNAME. I found some documentation in include/linux/input.h . I found a comment about EVIOCGRAB saying “Grab/Release device”. Hum, that sounded interesting, what implications has grabbing a device? The articles did not mention that ioctl, so, Let’s try it!

/* fd is /dev/input/event0 ( my keyboard )*/
ioret = ioctl(fd, EVIOCGVERSION, &version);

ioret = ioctl(fd, EVIOCGNAME(sizeof(device_name)), device_name);

ioret = ioctl(fd, EVIOCGRAB, &grab);
if ( -1 == ioret ) {
  perror("ioctl()");
}

printf("ver: %d, ret = %d\n", version, ioret);
printf("device name is: %s\n", device_name);

I paid the price of my stupidity, testing with my keyboard was not a smart thing, really. My keyboard stopped working and I had to brute force reboot. So, it seems that EVIOCGRAB ioctl is something like getting exclusive rights over the input events of the specified device, thus, X was not able to get my keyboard events, the only process that was able to receive them was my program, and it was not doing anything useful with them except printing “got some data” message.

I tried next with /dev/input/event1 ( my touchpad ) and got a “Device or resource busy” and tried with /dev/input/event2 ( my trackpoint ) and it succeeded. I connected an USB mouse and it created /dev/input/event3. Tried with that one and it also worked. So, it was clear to me that someone ( X of course ) was grabbing my touchpad for some reason and ignoring events, error or feature?, I still don’t know. So, I did a “touchpad-takeover” program that will not allow X to take control of my touchpad and then, when already logged in, my program will release the device. That way events will start flowing again to any application waiting on /dev/input/event1 and /dev/input/mice ( this device is a mix of all the mouse devices events ), may be that way X will learn to play nice with me :)

Here is how the touchpad-takeover look like:

http://www.moythreads.com/htmlcode/touchpad-takeover.html

Now, everytime my laptop boots, I launch that program, with /dev/input/event1 as argument, in the background. X report an error saying “Synaptics can’t grab event device, errno=16″, aha!, how does that feels X??! hu? … As soon as I start my gnome session I send SIGINT to my program (kill -2) so it will release /dev/input/event1 and I am able to use my touchpad :)

Is not that odd? If X is able to grab my touchpad device, touchpad does not work. If I grab the device first so X cannot grab it, and then, once logged in, I release the device, I can work with my touchpad. Bug or feature? I think some X configuration might alter that behaviour, but I have not found it so far. May be with some more time in my hands I will dig into the X server code to find out what is going on. Even when I am able to use my touchpad, the scroll down feature is missing in my touchpad :( . Before this strange stuff happened, I could move my finger on the right edge of the touchpad and it had the effect of scrolling down my screen/documents, now it does not.

Long post … time to go back to Guadalajara. I arrived at Puebla yesterday to give an Asterisk talk at ENLi, I was planning to get back yesterday at night, however, Tato did not show up to give his Asterisk presentation, fortunately I talked with Sandino this morning and he said Tato is OK, so I gave the presentation this morning in his place. Gotta go, I’m quite tired …

Overriding the virtual table in a C++ object

Friday, September 14th, 2007

Before starting, I just want to mention that I recently joined http://www.planetalinux.org/ , is a nice group of Linux bloggers, if you read this blog and want to read some interesting posts ( mostly in spanish, but some guys post in english ) go there!

Having said that, let’s finish this post.

Yesterday I was discussing with a friend of mine about how polymorphism is implemented in C++, and that is, using a virtual table ( remember the “virtual” keyword in method definitions? ). A virtual table is, rawly speaking, just like an array of function pointers. Each created object with virtual methods needs a virtual table. So, where does the virtual table is stored?, I really don’t know, but I do know where I can find the address of the virtual table associated to an object ( at least in g++ 4.1.1 ), the first sizeof(void*) bytes of an object are used to store a pointer to the virtual table. With this knowledge, one could think that is possible to override the virtual table pointer of the object and call arbitrary functions, and yes, we can. Let’s see some fun code.


#include <iostream>

using namespace std;

class Parent
{
    public:
        virtual void VirtFunc1() { cout << "Parent::VirtFunc1" << endl; }

        virtual void VirtFunc2() { cout << "Parent::VirtFunc2" << endl; }
};

class Child : public Parent
{
    public:

        void VirtFunc1() { cout << "Child::VirtFunc1" << endl; }
        void VirtFunc2() { cout << "Child::VirtFunc2" << endl; }
};

typedef void (*virtual_function)();

struct FakeVirtualTable {
    virtual_function virtual_one;

    virtual_function virtual_two;
};

void fake_virtual_one()
{
    cout << "Faked virtual call 1" << endl;
}

void fake_virtual_two()
{
    cout << "Faked virtual call 2" << endl;
}

int main()
{
    /* declare a Child class and a base pointer to it. */
    Child child_class_obj;
    Parent* parent_class_ptr = &child_class_obj;

    /* create our fake virtual table with pointers to our fake methods */
    FakeVirtualTable custom_table;
    custom_table.virtual_one = fake_virtual_one;

    custom_table.virtual_two = fake_virtual_two;

    /* take the address of our stack virtual table and override the real object pointer to the virtual table */
    FakeVirtualTable* table_ptr = &custom_table;

    memcpy(parent_class_ptr, &table_ptr, sizeof(void*));

    /* call the methods ( but we're really calling the faked functions ) */

    parent_class_ptr->VirtFunc1();
    parent_class_ptr->VirtFunc2();

    return 0;
}

So, try to run that code and, of course, the expected result is having fake_virtual_one() and fake_virtual_two() functions called. No magic there, we just replace the first sizeof(void*) bytes of the object with our own table pointer. There is not a use I can think of right now, but it is funny ….

Overflow in AppConference

Sunday, September 2nd, 2007

Some weeks ago I got an offer for a free VoIPSurfer license. Why? just because of a patch I did one year ago for an issue I had, the patch helped the guy who seems to own voipsurfer, so, it is kind of open source Karma. The offering made me remember the interesting issue I had and I thought today it is a good day to write about it. Some background information can be found in this thread where I discussed with some Asterisk developers the issue: http://lists.digium.com/pipermail/asterisk-dev/2006-November/024616.html

Asterisk, as a software PBX, supports conferencing. However, default implementation in 1.2.x versions did not support native audio mixing. Asterisk default conferencing application is “app_meetme” and it works only if you have zaptel hardware available or if zaptel driver zt_dummy.ko is installed, otherwise it cannot work because meetme() use zt_conf functionality. Because of this I started looking for alternatives that did not require zaptel. I found app_conference which at that time was part of the IaxClient project, now it seems to be an independent project.

Anyway, I started doing some testing and kaboom!, Asterisk crashed when one of our IAX2 servers joined the conference. Running valgrind lead me to find code like this:

void mix_slinear_frames( char *dst, const char *src, int samples )
{

        if ( dst == NULL ) return ;

        if ( src == NULL ) return ;

        int i, val ;

        for ( i = 0 ; i < samples ; ++i )
        {

                val = ( (short*)dst )[i] + ( (short*)src )[i] ;

                if ( val > 0x7fff )
                {
                        ( (short*)dst )[i] = 0x7fff - 1 ;
                        continue ;
                }

                else if ( val < -0x7fff )
                {
                        ( (short*)dst )[i] = -0x7fff + 1 ;
                        continue ;
                }

                else
                {
                        ( (short*)dst )[i] = val ;
                        continue ;
                }
        }

        return ;
}

This is a typical function where we’re in buffer overflow danger. It receives 2 buffers as arguments and just 1 len. So better for the caller to be sure that both src and dst have enough data to read/write from/to. I quickly found a call like this:

// allocate a mix buffer which fill large enough memory to
// hold a frame, and reset it’s memory so we don’t get noise
char* cp_listenerBuffer = malloc( AST_CONF_BUFFER_SIZE ) ;

memset( cp_listenerBuffer, 0×0, AST_CONF_BUFFER_SIZE ) ;

// point past the friendly offset right to the data

cp_listenerData = cp_listenerBuffer + AST_FRIENDLY_OFFSET ;

// reset the spoken list pointer
cf_spoken = frames_in ;

// really mix the audio
for ( ; cf_spoken != NULL ; cf_spoken = cf_spoken->next )
{

    //
    // if the members are equal, and they
    // are not null, do not mix them.
    //
    if (
      ( cf_send->member == cf_spoken->member )
       && ( cf_send->member != NULL )
    )
    {

        // don’t mix this frame
    }
    else if ( cf_spoken->fr == NULL )
    {

        ast_log( LOG_WARNING, “unable to mix conf_frame with null ast_framen” ) ;
    }
    else
    {

        // mix the new frame in with the existing buffer
        mix_slinear_frames( cp_listenerData, (char*)( cf_spoken->fr->data ), cf_spoken->fr->samples);
     }
}

So let’s look closely to the call. The first argument passed is cp_listenerData, a pointer to a brand new voice frame where all the mixing of the conference audio sources will be stored. As we can see in the code, the buffer is of a fixed length AST_CONF_BUFFER_SIZE, but since the pointer is advanced AST_FRIENDLY_OFFSET, then the effective length is AST_CONF_BUFFER_SIZE – AST_FRIENDLY_OFFSET. Let’s see some interesting defines:

// 160 samples 16-bit signed linear
#define AST_CONF_BLOCK_SAMPLES 160

// 2 bytes per sample ( i.e. 16-bit )
#define AST_CONF_BYTES_PER_SAMPLE 2

// 320 bytes for each 160 sample frame of 16-bit audio
#define AST_CONF_FRAME_DATA_SIZE 320

// 1000 ms-per-second / 20 ms-per-frame = 50 frames-per-second
#define AST_CONF_FRAMES_PER_SECOND ( 1000 / AST_CONF_FRAME_INTERVAL )

// account for friendly offset when allocating buffer for frame
#define AST_CONF_BUFFER_SIZE ( AST_CONF_FRAME_DATA_SIZE + AST_FRIENDLY_OFFSET )

So AST_CONF_BUFFER_SIZE is 320 + AST_FRIENDLY_OFFSET, since we don’t really care about the asterisk friendly offset since *real* data must start after this offset, we can say our buffer is 320 bytes long. Where this number comes from? Well, some assumptions are made to calculate that number:

1. Telephony most common sampling rate is 8000 samples per second.
2. Audio frames will be 20ms long.
3. Each sample will be encoded using 16 bits ( “slinear” format ).

Given those assumptions we can then calculate: At 8000 samples per second, 20ms audio frame will be made of

( 8000 samples per second * ( 20ms / 1000ms per second) )

that is 160 samples in one 20ms audio frame. Since each of those samples will be encoded with 16 bits, we have a audio frame byte length of:

( 160 samples * 16 bits per sample / 8 bits per byte ), that is 320 bytes, hence #define AST_CONF_FRAME_DATA_SIZE 320

Assumption of 8000 samples per second is an Asterisk requirement, currently Asterisk does not support other sampling rate. 16 bits per sample is assumed because the mixing will be done in slinear format, so this is ok too. However, 20ms audio frames is a *bad* assumption. Some common phones ( Linksys SPA? ) provide an option in their configuration page to change the RTP frame size, and that caused the crash. How many bytes does a 30ms frame has?

( 8000 samples per second * ( 30ms / 1000ms per second) )

that is 240 samples, each one of those samples of 2 bytes ( 16 bits ) give us a frame size of 480 bytes!, far more than the considered 320 bytes, and kaboom! we have a buffer overflow that could lead in the best case to a DoS attack.

Asterisk smoothers to the rescue!

Asterisk smoothers is a way to take variant size frames as input and return frames of single size. So, we can feed the smoother with 480 bytes frames or whichever size we get and the smoother will return us a frame of the size we want ( 320 bytes in this case ).

struct ast_smoother *ast_smoother_new(int size);
int ast_smoother_feed(struct ast_smoother *s, struct ast_frame *f, int swap);
struct ast_frame *ast_smoother_read(struct ast_smoother *s);
ast_smoother_free(struct ast_smoother *s);

So, I used this interface to smooth each frame with more samples ( hence size ) than required for the slinear audio mixing. Each time a new frame was read from a conference member I added this verification and partial fix code:

/* create the smoother if were receiving more samples than needed */
if ( AST_CONF_BLOCK_SAMPLES < fr->samples ) {

  if ( member->inSmoother == NULL ) {
    /* calculate bytes per sample */

    ast_log(LOG_DEBUG, “%s frame has %d samples in %d bytesn”, member->channel_name, fr->samples, fr->datalen);

    float bytes_per_sample = ( (float)fr->datalen / (float)fr->samples );

    ast_log(LOG_DEBUG, “%s frame has %f bytes per samplen”, member->channel_name, bytes_per_sample);

    float new_frame_len    = ( AST_CONF_BLOCK_SAMPLES * bytes_per_sample );
    /* WARNING, currently iLBC codec is not fully supported, sound is still choppy */

    if ( fr->subclass == AST_FORMAT_ILBC ) {
      ast_log(LOG_DEBUG, “ILBC format only accepts datalen multiple of 50, so make it happyn”);

      if ( new_frame_len < 50 ) {
        new_frame_len = 50;
      }
    }

    ast_log(LOG_DEBUG, “%s new frame size is %fn”, member->channel_name, new_frame_len);

    member->inSmoother   = ast_smoother_new(new_frame_len);
  }
  ast_smoother_feed(member->inSmoother, fr);

  while ( ( sfr = ast_smoother_read( member->inSmoother ) ) ) {

    conf_frame* cfr = create_conf_frame( member, member->inFrames, sfr ) ;

    if ( cfr == NULL )
    {
      ast_log( LOG_ERROR, “unable to malloc conf_framen” ) ;

      return -1 ;
    }
    add_member_frame(member, cfr);
  }
} else {

  conf_frame* cfr = create_conf_frame( member, member->inFrames, fr ) ;

  add_member_frame(member, cfr);
}

Messy, but hey! it compiled! ship it! ( I don’t remember why in the hell I used float for bytes_per_sample, but im sure I had a good reason?? )

So, what I did was just calculate the proper frame size when more samples than needed were received, so, when converted to slinear, the frame will result in 240 bytes as required by mix_slinear_frame and it will not overflow mixing the audio properly.

Anyway, I still got a problem I did not solve at that time. For some reason iLBC codec only accepted frame sizes multiple of 50, and since slinear mixing required 240 ( not a multiple of 50 ) voice sounded choppy :(

I guess some improvements were made to app_conference since I used it last time, so I will test it with Asterisk 1.4 and let’s see how it goes …

I’ll post results later,

Packing structures and classes

Thursday, August 16th, 2007

In my previous post I was talking about memory alignment, and how the compiler can add padding to structures and classes. Sometimes you don’t want the extra padding, let’s say, because you are going to use the structure or class object as data header to be sent to the network. For simplicity sake let’s take the structure we saw in my previous post:

struct aligned {
    long long_data;
    char byte_data;
    int integer_data;
};

Despite the fact that I would never send such structure by the network, since you never know wich computer architecture is going to receive the data, thus, you have to consider endianess and size of int’s etc, the important thing here is that with padding, that structure is likely to have a size of 12 bytes ( see previous post to know why ). If we want to avoid padding effects, and we are using GCC compiler, we can use packed attribute, like this:

struct aligned {
    long long_data;
    char byte_data;
    int integer_data;
} __attribute__ ((packed));

Using sizeof() to check the size of such structure will show us that has the expected 9 bytes for x86 32 bit. Other compilers, like VisualStudio, might use the #pragma directive, not quite sure about how it is used there, but the effect is the same.

Packing a struct or class is easy, however, there are some considerations we should be aware of, let’s say we have this program:

#include <iostream>

using namespace std;
struct packedstruct {
    int  integer;
    char byte;
} __attribute__ ((packed));

void func(int &val)
{
    cout << "calling func\n";

    val += 10;
}

int main()
{
    packedstruct mypackage;

    mypackage.integer = 10;
    func(mypackage.integer);

    cout << "val is " << mypackage.integer << "\n" << endl;

    return 0;
}

What’s the problem with that? well, trying to compile that code, in recent GCC ( 4.1.0 ), will result in an error like this:

packed.cpp:20: error: cannot bind packed field ‘mypackage.packedstruct::integer’ to ‘int&’

It seems somehow representation of packed/unaligned data is not fully compatible with aligned data? We had this issue with some code, I have not understood yet what exactly means, but doing “const” the reference get rids of the issue, of course, if you really need the reference to be non const, how to proceed? I would like to know what is really going on here, if anyone out there knows, please let me know, I found this link, im still chewing that to understand the implications of what that guy says about the LSB … http://gcc.gnu.org/ml/gcc-patches/2003-07/msg01664.html

Memory Alignment

Sunday, August 5th, 2007

I have been busy lately, however, from now on I will try to post at least twice a month. Now I wanted to post about an interesting issue I had at work that is related to endianess and memory alignment. I knew the concept of endianess, but never faced a bug related to it. However, 2 months ago ( yeah, I should have posted this before, but you all know that I am a procrastinator ), we detected a failure in one of our test cases for the ODBC driver to connect to the iSeries. The test case was failing on PowerPC 32 bit architecture only, x86-32bit test was successfull.

So, I logged on via ssh to the test server LPAR and started debugging the problem. I bless gdb and vim, because I did not have to install anything else to debug/develop on site. Finally I found the problem in code similar to this one:

class ConvertHandleToObject {
    /* lots of code here */
    union {
        void* void_ptr;
        TYPE1* type1_ptr;
        TYPE2* type2_ptr;
        TYPE3* type3_ptr;
        struct {
            unsigned free:1;
            unsigned next:31;
        };
    };
};

“free” member was used to denote that the object of class ConvertHandleToObject was available to be used/returned as a handle, but, why share memory space with the actual pointer to the handle object? , and more important why it works in x86 but not at PowerPC?

The reason behind it is memory alignment.

To understand memory alignment we must first introduce “aligned memory access” and “unaligned memory access”. Aligned memory access is when the processor attempts to fetch a data object of size N stored at some memory address that is multiple of N. That is, if we want to access a 32 bit Integer ( 4 bytes ) in aligned fashion, the integer must be stored at memory addresses multiple of 4 (0×04, 0×08 etc), and that’s why access to “char” values are always aligned ( sizeof(char) == 1 ). Unaligned access is the opposite, fetch a dataobject of size N stored at memory address NOT a multiple of N, like fetching a 32bit integer at memory address 0×03. Alignment is important because if the data in your program is aligned, access to data will be faster. Fortunately for us, compilers take care of aligning data. Let’s see an example:

struct aligned {
    long long_data;
    char byte_data;
    int integer_data;
};

This is a classic example of memory alignment. Someone might say that a structure like that has a memory size of sizeof(long) + sizeof(char) + sizeof(int) … that is, 9 bytes for common 32 bit architecture. However if you print sizeof(struct aligned) you will get, most likely, 12 bytes. So where are those 3 bytes of difference? Well, the compiler added 3 bytes of padding to align the “integer_data” member start address. Lets say some structure like this is stored at address 0×00, thus, long_data address is 0×00, byte_data address starts at 0×04 and then, if the compiler ignores alignment requirements, integer_data would start at 0×05, and we would have unaligned memory access when reading integer_data member. Thus, the compiler has added 3 padding bytes, so the aligned struct will be equivalent to:

struct aligned {
    long long_data;
    char byte_data;
    char padding[3];
    int integer_data;
};

If you compare the size of this struct with the size of the previous struct you will find out that both have a 12 bytes length.

Let’s go back to our buggy union. Unions take the alignment of the longest member, in our union the longest element is any of the pointers ( 4 bytes for 32 bit architecture ), thus, the union memory space will be aligned in 4 byte multiples. Those unions were initialized with free = 1; but in the code there were no places where free = 0; The programmer of this code thought that any address aligned in 4 byte multiples will have the last bit set to 0, thus, at any moment a valid pointer was assigned to any of the other union members, the less significant bit at the less significant byte will be zero. Why? well, for the last byte, multiples of 4 in binary are:

00000100 ( 4 )
00001000 ( 8 )
00001100 ( 12 )
00010000 ( 16 )

The code worked well at little endian, confusing code, but it worked. However, at the time of moving this to powerpc ( big endian ) the byte and bit that is assigned to free is no longer the less significant bit and byte, but the most significant, and the code breaks because even when a valid address has been assigned to some pointer in the union, free keeps being 1 ( non free ).

Still with this in mind, I dont quite understand 2 things.

1. The failure was present only in RedHat, not in SuSE. I don’t remember distribution and kernel versions.
2. The failure was present only when launching 2 or more threads.

Somehow allocated memory in SuSE did not hit this issue. When not using threads, RedHat allocations did not hit the issue either. Here are some examples of the memory addresses.

Good:
00010000 00000011 11001000 01100000 ( 0x1003B2D8 )

00010000 00000011 10110010 11011000 ( 0x1003B2D8 )

Bad:

11110110 01100000 00100110 10111000 ( 0xF66026B8 )

Failures started when memory addresses started with F. In little endian, as we see, the memory ended in 8, so, less significant bit was 0 ( setting “free” to zero ), but in big endian the most significant bit was 1, causing the failure.

Conclusion: C/C++ are low level languages, usually you can do funny stuff with hardware. But if it is not necessary don’t do it. The code fix was easy, just move out of the union the free member to not depend on the memory address, and just set free = 0 when the handle was in use.