In my August 2020 article, “How to choose a cloud machine learning platform,” my first guideline for choosing a platform was, “Be close to your data.” Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning — especially deep learning — tends to go through all your data multiple times (each time through is called an epoch).
I said at the time that the ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent. The natural next question is, which databases support internal machine learning, and how do they do it? I’ll discuss those databases in alphabetical order.