Known Issue: The ByteBlower process refuses to start: Failed to allocate 1156MB hostbuffers for numa node 1
Posted by Vincent De Maertelaere, Last modified by Dries Decock on 03 April 2018 02:03 PM
It rarely occurs that the ByteBlower process fails to start. One of the errors that can be encountered is:
This article explains what happens and how to get out of this state.
It is a fact: computers have RAM memory, processes use this memory. When a computer is running for some time and running tasks accordingly, large parts of the memory is getting allocated by processes and free'd afterwards. The memory management part of the Linux kernel, divides the memory in so-called pages. When a program requests some memory, one or more pages is reserved for that program. When the program is terminated, the memory is free'd (deallocated if you like) again.
A side effect of this is that memory can get fragmented. When a program requests for a certain amount of memory (e.g. 100 megabytes), the kernel tries to find that number of continuous pages (if available), otherwise it can return some fragments.
Some parts of the ByteBlower process require large chunks of memory. That memory must not be fragmented for technical reasons, so if that allocation fails, this specific error is given.
The bad news is: There is no way to force the kernel to reallocate all memory so there are large chunks of memory available. This functionality should require reassigning memory that a process can be using at that specific moment. So the only remedy is rebooting the server.