Commonsense-Based Cross-Media Information Retrieval
I’ve been thinking about commonsense computing and information retrieval.
And here’s my idea.
Since to-day content analysis works are mostly dependent on domain knowledge, they are basically limited in these domains.
It’s nearly impossible for people to find a general way to make the computers know what’s going on in any media, whether in images, videos,or in 3D models. I believe the reason for this situation is that, what we’re using are merely low-level features, which are by nature unreasonable for us to get high-level analytical results, unless we can introduce related high-level information.
Yesterday when I stop my motorcycle in front of a restaurant and saw a bicycle beside me, “a person riding on the bycicle” came up to my mind as soon as I looked at the bike’s shape. I immediately realized that the shape of a thing in our lives is actually highly related to its function, which is something long missing in modern information retrieval techniques. That is, 3D models are recognized through totally different approachs from videos. It’s somehow a serious problem, because the function a thing actually helps us, at least me, in the recognition process. I wouldn’t be able to recognize many things at first sight without thinking of its function. If the recognition/classification of all kinds of media can be processed according to the inherent CONCEPT of a medium instead of its SHAPE, COLOR, and other low level features, the we might get to somewhere extremely different.
延伸閱讀