findings on VLMs acknowledgements Tomáš Daniš for helping me with my understanding of the AnyMAL paper as well as checking my codebase for errors, this project would not have happened without his help. vik for moondream1 which inspired me to start working on VLMs and moondream2, which helped me make an image-caption dataset. finally, thanks for the great test images used in this post. background after spending a long time toying around with small projects and shelving them, i decided it was time for me to start working on something bigger.
how a neuron learns warning: this is an extremely simplified explanation. if you want to delve deeper, there are excellent resources like Andrew Ng’s Deep Learning Specialization or Neural Networks from Scratch in Python.
this is a neural network. you’ve likely seen one before and have, at least, a vague idea of what it does : it learns. what you may not know is that they can be thought of as universal function approximators.
knowing other people knowing other people matters more than the work you have done.
someone spending a few days in a corporate environment will reach that conclusion rather quickly. it doesn’t take much time to notice someone missing deadlines, spending time on their phone all day, and not having clear skills except being a smooth talker.
it’s easy to grow resentment towards that, especially if, like me, you have been taught all your life that hard work is what will matter more.