User Tools

Site Tools


Why may Linux driver developers hate distros like CentOS?

A recent incident from my life.

Deal with CentOS 7.5. Base kernel for it - 3.10.0. Keep it in mind.

CenOS kernel 3.10.0-862 contains back ported ALSA stack from newer kernels. New stack discards snd_card_create call and replace it with snd_card_new that mostly same, but put additional argument to the front of parameters:

int snd_card_create(                       int idx, const char *xid, struct module *module, int extra_size, struct snd_card **card_ret)
vs
int snd_card_new   (struct device *parent, int idx, const char *xid, struct module *module, int extra_size, struct snd_card **card_ret);

(arguments aligned by me)

But, a lot of internal drivers still refer to the old snd_card_create and CentOS maintainers adds next hack:

static inline int __deprecated
snd_card_create(int idx, const char *id, struct module *module, int extra_size,
        struct snd_card **ret)
{
    return snd_card_new(NULL, idx, id, module, extra_size, ret);
}

that just wrap snd_card_new and pass NULL as a first argument.

But it is wrong. Because snd_card_new contains next line:

int snd_card_new(struct device *parent, int idx, const char *xid,
		    struct module *module, int extra_size,
		    struct snd_card **card_ret)
{
...
	card->dev = parent; // [1]
...
	snprintf(card->irq_descr, sizeof(card->irq_descr), "%s:%s",
		 dev_driver_string(card->dev), dev_name(&card->card_dev));
...
}

nothing wrong for a first look. But… I see next call:

dev_driver_string(card->dev)

Look to the [1], card→dev just a NULL. Refer to the dev_driver_string, it is simple function:

const char *dev_driver_string(const struct device *dev)
{
        struct device_driver *drv;
 
        /* dev->driver can change to NULL underneath us because of unbinding,
         * so be careful about accessing it.  dev->bus and dev->class should
         * never change once they are set, so they don't need special care.
         */
        drv = ACCESS_ONCE(dev->driver);
        return drv ? drv->name :
                        (dev->bus ? dev->bus->name :
                        (dev->class ? dev->class->name : ""));
}

Note, dev here is NULL. And in all lines of this function NULL is dereferencing!

Now look into our driver. We use old ALSA API for kernels less than 3.15.0 and new one for any newest. Base kernel version for CentOS is 3.10. So, we select old API call. And when our driver is loaded, kernel moves to the Panic.

So, we should do additional check like:

#ifdef RHEL_RELEASE_CODE
#  define HAVE_SND_CARD_NEW (RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,5))
#else
#  define HAVE_SND_CARD_NEW (LINUX_VERSION_CODE >= KERNEL_VERSION(3,15,0))
#endif
 
#if HAVE_SND_CARD_NEW
    // use new API here
#else
    // use old API here
#endif

Same ugly things also doing by OpenSUSE maintainers. But in additional, thay does not provide way to self-detection similar to RHEL_RELEASE_CODE. At all.