Ideas, Ideas, Ideas
Monday, October 31, 2005
Wednesday, October 26, 2005
Monday, October 24, 2005
Saturday, October 22, 2005
What Am I Gonna Wear Today?
故事發生自上禮拜
Commonsense Computing and Interactive Application的課
也就是我老闆Henry開的課 常識計算和互動應用
第一次作業deadline的前一個禮拜四 我真的想不出來要做什麼了
很怕再拖下去會完全來不及
想說自己老闆開的課 demo不能遜掉會丟臉
所以硬是選了當初覺得很simple 不夠屌的idea
是一個會根據文句內容判斷什麼樣的場合該穿衣櫃裡哪件衣服的小程式
決定先做出來 做一個起碼有像個樣子的東西 之後再慢慢改進
不然我壓力好大
結果不知道怎麼的 那天下午趴在桌上
睡著之前模模糊糊一個idea浮過
醒來以後整個興致來了 程式就批哩趴拉開始寫
從來沒用過的python可能是因為真的太好用太容易學
敲著鍵盤一邊覺得自己要飛起來了 速度好快
沒多久 傍晚就寫完了
結果跑出來了的時候
我忘了input sentence是"I am going to a wedding"還什麼的
總之結果show出來三件衣服 完全就是work
我大聲叫了一聲"YES!" 不曉得吵了附近多少個人
很興奮 怎麼會這麼快這麼簡單就被我搞定了
我把電腦的線全部拔了 抓起來就往外衝
忍不住要趕快抓住哪個人 show給他看
* * * * *
其實這樣的小程式也不是什麼真正很屌的東西
但是我就是一個興奮 很難形容
不只是因為 是在這裡第一個做出來的小小成果
我頓時也發覺
難怪Hugo能夠一年發四五篇那種人家三兩年發不出一篇的paper
難怪他的東西會讓人家覺得多到可怕 到底為什麼能productive成這樣
因為這種程式 比起以前我們做的3D graphics來得太容易寫
想出idea到真正實現之間的距離好短
所以idea可以一直冒出來一直冒出來 然後系統就也跟著一直跑出來
每隔一陣子 project page就又多了新東西
當然這中間是也有很多aspect值得質疑 好壞值得仔細評估
但反正一下子做出來了 就是一個爽
結果 喜歡的人還不少
雖然只是個小程式 功能不算太多
但起碼你可以type in whatever you want
後來上課當天 我demo的時候
反應還蠻好的 起碼比一些人的東西來的讓人注意
還蠻開心的
* * * * *
demo完 想說這小玩意大概就這樣交差了事了
沒想到還有後續發展
這禮拜是Media Lab的sponsor week
很誇張的數目的世界知名大公司的老闆 技術人員
通通跑來Media Lab 看自己每年付給我們幾百萬美金
我們到底搞出些什麼名堂
原本我要show的 是Henry剛畢業的學生Alex的碩士論文
他在CMU繼續唸書 所以我們都想說他應該不可能回來demo
但是就在禮拜五 我們差不多都準備ready了
才收到他的信 說很抱歉這麼晚突然這樣決定
說要回來demo自己的系統 所以這下就不用我幫忙了
看到整個傻眼
於是Henry回信給我 問我要不要考慮show我的衣服的作業
(其實感覺起來不像問 像是告訴我)
所以禮拜六我花了一整天十二個鐘頭
把原本只有文字介面的程式加了GUI
讓結果show出來的不再是黑色一大堆字跑來跑去的恐怖樣子
而有三張衣服的照片 看起來親切多了
然後寫了下面這段description 寄給Henry
之後收到他的回信 上面短短的一句話
"Great! Let's go for it."
算是來了一個多月 第一次這樣被他稱讚
What Am I Gonna Wear Today?
It happens before dinners, meetings, movies, visits, or even only a walk. We all ask ourselves the same questions almost everyday, and for all the activities we participate - "What am I gonna wear?" . It tends to be tricky or even bothersome for many of us to make the decisions, because what we wear reflects many clues that are important to our social interactions, such as manners, tastes, and how people we meet mean to us. Viewing this process computationally, we claim that deciding what to put on is in fact striding across the gap and searching for the match between two concepts: a) how the clothes express our characters, and b) how the events or people we are to meet mean to us. In this project, we attempt to apply commonsense computing techniques to achieve this concept-matching goal, and provide users helpful suggestions. Five dimensions are built particularly for describing the styles or functions of clothes/accessories: luxurious<->simple, formal<->casual, sexy<->conservative, modern<->classic, and elegant<->vulgar. We weighed several common brands (e.g., sisley, nike) and wearables (e.g., jacket, glasses) in these dimensions, and also a word list that contains tens of words about events, places, and social relationships. Using the spreading activation technique, the concepts carried by the words in the word list are spread to all the other terms in ConceptNet, such that the system is capable of providing suggestions under general cases.
The scenario of using this system goes as follows. After providing a set of descriptions about the owned clothes and accessories, the user only needs to type in any sentences he/she wishes (e.g., "I am going to a dinner with my girlfriend's parents." or "I feel relaxing today."), and the system will find the most suitable choices accordingly. The fuller the wearables as well as the input sentences are described by the user, the more accurate the system can catches his/her needs, and the more helpful the suggestions will be. The system is now in its preliminary stage, and will be performed with user testing after more complete development.
* * * * *
禮拜二跟三 整個media lab塞了滿滿的sponsors
我們賣力地向所有的人介紹自己做了什麼樣的東西 他們能怎麼用
搞得喉嚨很累
完全就是像在世貿擺攤位 只是這裡沒世貿那麼大就是了
第一天來了很多telecom的人 大部分的人對手機 家電的東西有興趣
所以我也就是乖乖的demo 沒覺得會有什麼人對我的東西太感興趣
誰曉得好巧不巧
禮拜三早上 來了兩個人
一個是義大利的fashion designer (http://www.mattori.it)
一個是瑞士的在Hugo Boss員工
看了我的東西很感興趣 覺得應該派的上用場
我告訴他們我們在考慮要做一個網站 讓大家可以上來玩
他們就說 希望我可以在一個禮拜內做出來
好讓他們回去以後可以拿給老闆看
當下我超興奮的!
我迫不及待跟Henry講了這件事情
他也很高興 說"You never know who's gonna come,"
"So you can see we can never do anything TOO crazy"
(言下之意就是不管有多天馬行空 都不過分)
據說Hugo Boss是Media Lab的 perspective sponsor
正在觀望要不要投資我們
James小鄧做了的show globe他們也很喜歡
要他十二月飛去米蘭跟佛羅倫斯參展
搞不好就因為我們兩個人做的東西
在這兩天幫Media Lab拉了新的sponsor
於是我跟Francis討論了一下 決定就把上次逛街到後來
在China Town邊吃邊聊的網站idea做出來
昨天 就是禮拜四晚上 剛好沒其他事
就一鼓作氣搞定了
我把python code整理了一下 Francis用flash很快地做了一個頁面
兩個人效率算很高
(雖然我的程式裡面的bug後來花了好久時間才抓出來 搞的他很火:P)
我一邊說可以跟world famous designer合作感覺很讚
他一邊也說 可以專心搞design不用理程式很愉快
現在雖然還有很多東西還沒加上去 不過起碼大概可以讓大家感覺一下
http://dhcp-44-136.media.mit.edu:8000/fashion/
* * * * *
然後 因為這幾天拼命跟源源不絕的人講話的關係
現在腦袋裡有一堆idea可以做
好比說 可以根據你在交友網站上面寫的自我介紹
like, "I am a dog lover. I like rock music.
I am very out going, and like to go
to the parties..."
去分析說 在什麼樣的場合
不同的人可能會想要穿哪些種不一樣的衣服
或者反過來問user 如果你想要去跟女朋友約會
會想要穿哪幾件?
然後想辦法把user選出來的那幾件的style記起來
之後類似的場合也可以給類似的建議
還可以跟手機合併 讓在實體商店裡逛街的人可以很容易找到
眼前的這件衣服可以跟自己家裡衣櫃裡的哪褲子搭配
不然就是
旅行前要打包的時候 也可以知道哪幾件太厚太重
所以做好搭配 給點建議 讓我們不必太一大堆衣服跟著走
anyway 反正可以做的事情很多
每個人看到這個系統都有不同的想法
是我覺得很有趣的地方
也有來這裡參觀的人對technology並不是太了解
就一直講自己生活裡面的事情
有一個媽媽 就開始講自己怎樣幫小孩準備衣服
and怎樣幫她的父母打點
講一堆跟technology aspect並沒有直接關聯的東西
對我來說卻是非常有價值的意見
做家具的 做電子儀器的 各種各樣不同background的人
帶來各式各樣不同的想法
是很有意思的經驗
我的桌上現在堆滿了Walter (Media Lab的一個教授)
送給我的好幾十條領帶
(他拿來的時候真的嚇了我一大跳)
I think the word Henry said is really right
"You can never do anything TOO crazy"
延伸閱讀
Monday, October 17, 2005
Thursday, October 13, 2005
The Problem of Specificity and Generality - Video Sequencing Weekly Notes Oct 12
I wrote a program yesterday for constructing relationships between two clips by analysing them using ConceptNet and spreading activation. It turned out that either the related concepts are too few and show no difference from keyword matching (when the spreading depth/width are small) , or there is too much garbage in the related concepts that the system suggests, and the concepts that people would regard as important cues are blurred and deluted (when the depth/width are larger).
The problem, as Barbara expected, comes from the fact that the concepts explodes increidibly fast and widely when we use spreading activation in ConceptNet, which is in nature a collection of general, while shallow, common sense. The difficulty of applying this kinda tool in storytelling thus turns out to be that, it can't narrow down to specific topics and find the traces of the semantics as humans do, when two or more pieces of information are supplied. The two sentences, "The director arranged the performers for a final rehearsal" and "Tom felt frustrated because he still couldn't find the beat" may actually be highly relevant in an event of dance performance, but the word set ("director", "arrange", "performer","final", "rehearsal") are just so irrelevant with ("feel", "frustrated", "find", "beat") in ConceptNet.
More specifically speaking, the locations in ConceptNet of the nodes of these two word sets are too distant that it is reasonable for the system to be incapable of finding their high relevance. If we can fill in the semantic gap between these two senteces using a broader, less specific descriptions for the context (e.g., "This is a rehearsal for a dance performance. The director directs a group of performer on the stage......."), then it might be easier for the system to find that they are actually pretty related.
Consider that we have the users to write a story about the video they're to make, and the system arranges the available clips automatically into some sequence according to this story. The advantages are twofold. First, such story can serve as the bridge that fills in the semantic gaps among clips, so different clips can be related to one another by routing through the story sort of being in the middle. The relationships among clips can thus be constructed, after all there would be difficult to construct them otherwise. Secondly, "inputting stories and getting video sequences" could actually be a pretty nice way of interaction, since the output sequence would thus be generated according to the user's narrative, instead of relatively meaningless correlation using words that aren't even guaranteed to be important concepts.
I read Hugo's Bubble Lexicon today, and I think it might be a proper tool for us to apply to achieve such goal. While performing the path-finding process in Bubble Lexicon for reasoning, the ContextNodes are always activated and used to boosts up the paths that pass them. If we can find such ContextNodes from the user's story and activate them like this, all the words in the two sets: ("director", "arrange", "performer","final", "rehearsal"), ("feel", "frustrated", "find", "beat") would become much more related.
延伸閱讀
Thursday, October 06, 2005
Video Sequencing Weekly Report Oct 5
What I have done this week:
1. Watching clips of Roballet as many as I can
2. Re-reading Barbara’s thesis
3. Formulating the system to be built
Describing the scene instead of the keeping the words literally
Example 1: “He is joking” instead of “I’ll giving you one dollar”
Example 2: “He welcomes the kids” instead of “…be responsible and respectful”
Some times such high level annotations can serve as conclusions, deriving from not only the annotated clip but also many else, which may be highly subjective.
Example: “…, which is something no one else does before”
In this example, the meaning of the annotated words was actually never shown explicitly from the clip, but was told by the annotator that tells it based on his/her judgment.
This is related to another issue: subjectivity of the annotation. The annotator’s subjectivity may influence how the edited/suggested sequence ends up with.
Example: “They also create a fun pose for everyone…” v.s. “Henri directs the group to create a fun pose for everyone”
There are some other times that annotations might not be appropriate enough
The annotation only focuses on one particular (maybe even minor, in the sense of time span) event that other parts might be neglected.
The words are abbreviated/inaccurate, e.g., “Christen can do it!” instead of “Christen thinks she can do it!”
The annotator directly types the words in the conversation in the video, instead of annotating“someone is saying that…” as a viewer
As annotations grows more and more with more clips, they get more and more abbreviated, and the data would be more and more insufficient. So we probably need a hierarchical knowledge representation that captures what is really going on along the progress of time in different levels, such that the system can leverage the higher-level knowledge to interpret/use the clips when the information carried by individual annotations is poor.
Would it be necessary for users to input the summarized description for a set of clips or a set of events? This is a question to be answered.
Sometimes the annotator mentions how things evolve, but not how they end up with, which cause data insufficiency.
Example: “but are forgetting one minor detail…”
The title of the clip provides important information.
Example 1: “Where’s Gustave?” suggests that Gustave may be the protagonist in this clip, and that people are probably looking for him.
Example 2: “A Director’s Thank” may be used to infer that either Henri, Jacques, or other people that could be the director would be the main character here. And the director is presumably appreciating someone else(s).
Irony/Jokes/Sarcasm might not be proper to be existent in the annotations.
Example: “The kids sure have learned!” does not show what it means explicitly.
2. Barbara’s Thesis
The creative need to see multiple story possibilities in real time during editing drives the development of a technology that can think with the videographer about the many stories that exist in her video collection.
In Barbara’s initial attempt, providing formal suggestions didn’t work well because “hard-coded rules that direct a videographer to take an establishing shot…enforce a particular narrative filmmaking style”, while users’ decisions of making a shot actually depend on the content and the context, which may include a lot of subjective information. In my case, however, I don’t think the way I’m using ConceptNet or other commonsense tools to build the story narrative would be too far from users’ idea of narrative, because the annotations are themselves subjective already. Suggested sequencings are generated depending on what users annotate the clips and how they annotate them.
“The novice videographers said they would like to use the ConceptNet suggestions in a shoot over the course of a day instead of in such a constrained shoot with and assigned subject.” I think ConceptNet is a more suitable tool for relating concepts of different events in a broader sense, which makes it something helpful in working on the story level instead of the levels of clips or raw footages. As for StoryNet, “[it] could help because … [it can] predict not only the next possible event by many event steps into the future.” StoryNet might be put into use if what it captures is higher level information – at least in my current point of view. If the steps it suggests are too low-level, then I would be skeptical about its helpfulness.
Questions:
To what degree should I take into account full story understanding? Or do I only need to focus on each of the steps in the story plot?
In Barbara’s work, “inference at each step is not influenced by all previous annotations.” Would that be suitable for my system? What are the reasons for following (or not) this kind of rule?
About the representation of the Story, I haven’t check Mueller’s notion on goals, plans, themes, spaces, and time in the book Story Understanding. How much do I need from the existing methods coping with computational documentary challenges, such as representing videos, creating continuity, structuring the story?
“Brooks’ word” example. Do we need a better StoryNet? Or what we need is simply a new way of using ConceptNet?
The System
1. Filter out the stop words
2. Reinforce the influences of former events by traversing its concepts in a decayed fashion.
3. Build the relationship matrix. For each of the (event, event) pair, find a correlation value.
4. Determine the legible “conceptually related” event pairs using a threshold value
延伸閱讀
Video Sequencing Weekly Report Sep 28 (last week)
What I have done this week:
1. Taking videos for the Media Lab Picnic Event
2. Writing annotations for all the 46 clips of the Thai Night video event
3. Trying a little bit of editing Thai Night
4. Watching
5. Borrowing books about directing/screenplay writing, and reading some articles in them.
6. Coming up the idea “The process of explanation” as well as its relationship with ConceptNet
And the ideas I got from the above work:
1.
1. It’s telling a story steadily, and not entertainingly
2. In terms of narrative, there’s only one time line. The events are chronologically sequenced, and this time line goes with the evolution of people’s opinions as well as the construction of the building.
3. Often, the environment is introduced first when a scene is going to be shown.
4. The whole plot is structured hierarchically. The broadest level describes the years, the finest level are the individual shots. In the middle are the events, which may be composed of several shots or even several events. In the broadest and the event level, the years and the events are sequenced chronologically, while the shots within one event may or may not be follow the time order.
5. The shots are not necessarily organized exclusively. They can also overlap or intersect each other.
6. A speaker’s voice can be played as the audio part while other related shots are shown for the video part. And the speaker’s face is hidden behind.
And the questions:
2. Distinguishing the “sequencing” and “juxtaposing” process for the shots
I think it’s necessary, when thinking about the video sequencing problem, to separate the ideas of “sequencing” and “juxtaposing” the shots. To my understanding, when I think of “juxtaposing” the shots, what I see is the details about how shots intersects or overlaps with one another. “Sequencing” the shots, on the other hand, is in a comparatively wider sense that involves with the flow of the whole story. If we can look at the flow and think about how we should develop the narrative for the whole story told by this video, we can focus more on the sequencing work, instead of mixing the two altogether.
3. My idea, “The process of explanation”
A scene can be one or more shots. It represents a scene in the story where different thoughts are collected.
延伸閱讀
Tuesday, October 04, 2005
Jazz
第一次看到吉他老師 是在S&P宿舍的其中一個迎新晚會
那天晚上的Jazz band很讚 尤其鼓手超級愛現
我跟Francis游完泳 套著垃圾袋 淋著大雨走到S&P
我一聽說有爵士樂就顧不得肚子餓了
隨便抓了點東西 就坐在前面的沙發上看
聽到後來才發現是Berklee教師群的band 據說彈吉他的黑人老頭是Berklee的學生
我們笑著說 果然是有志者事竟成
想到Berklee學音樂永遠不嫌老
結束之後我到前面去跟他聊天 說我覺得他彈的挺好聽
他遞給我一張名片 才發現
靠邀 是Professor 哪是什麼學生
我問他如果想要跟他學 他收不收學生
沒想到他自己主動就把兩個電話號碼寫在名片上
嚇了我一跳
後來到Jazz Ensemble去試音失敗以後
樂團的老師說要介紹我到小編制的團練習
有老師帶 (後來發現it's even better)
但是要我自己找private lesson上
我才真的去聯絡這位Berklee的黑人教授
我們約了時間 禮拜一下午五點在他家 12 William court, park street
結果他在電話裡說了價錢 一個小時$70
我當場愣住了 就說 那 我們見面談一談
看看我是不是真的適合再說
結果上禮拜一我去了
去之前還在跟小鄧說 大概就去這麼一次吧 實在太貴
五點 等到五點半多才開始上 因為他在和老朋友講電話
他問我了一些問題 我一邊答 他一邊自己一個人一直彈個不停
我一邊講就一邊覺得 阿你是到底有沒有在聽
後來我們挑了一首我不必看譜的standard song - blue bossa
直接開始彈 讓他大概了解我的能力
結果他自己一個一直solo好久 我就是在旁邊彈伴奏彈個不停
心裡一邊想
該不會像以前在海國學吉他一樣吧
老師走進來 坐下來拼命彈 彈到爽以後就下課了
學生學到的東西 只有 "老師你好屌"
越想越悶
後來他問我有沒有他的書
"You don't have my book? You should buy my book."
想說這下真的是碰到會坑錢的老師了 幹的好
直到我們真正開始聊 事情才整個開始變的不太一樣
我說我想學些什麼 覺得自己哪裡可以 哪裡不行
他卻接 他不這麼認為 我的實力和自己說完全是兩回事
"That is something I disagree"
(即使他也有說到 覺得我已經比他想像中的要來的好多了
有讓他訝異我可以談到這種程度)
他說 這不是去超市買盒cereal 撕開 倒進碗裡 那樣簡單
"We're making a gourmet meal, a fine meal here"
學爵士樂的過程像準備一道豐盛的中國佳餚
光烹飪前的準備就要花上好久好久的時間
我如果想要一步登天 那他辦不到
"I don't make magic. I make musicians"
"And I'm not making just musicians. GREAT musicians"
聽完以後
我突然發現 怎麼自以為是的人變成是我了?
我想了想 其實他完全都有在聽我彈
solo也好 伴奏也好 看起來像在神遊的時候其實耳朵都有張開
結果就這樣
"Ok, let's work together" 我說
心甘情願付一個月九千塊台幣學吉他
因為他不只是Berklee的教授 還是個很屌很有自信的傢伙
初次見面 老師選學生 學生也選老師
上完課原本一直在考慮要不要直接在porter 吃完飯再回lab
因為原本說要一起吃飯的Francis電話一直不接
考慮了半天還是決定先回去再說 因為下雨提著吉他實在不太好
回到lab 看到他們都不在 就覺得實在很悶 搞什麼說要吃飯又放鳥
放下吉他 打開電腦 登入信箱
沒想到又有驚喜出現了
七點半 爵士樂團練習 寄信來的人是Keala 應該是帶團的老師
這下禮拜一變成吉他日了
我顧不了肚子餓得咕嚕咕嚕叫 提著我的Gibson又走了出去
這機會可不能放過
結果碰上的是震撼教育
以前在台灣上課 一首歌一定要彈個兩三個禮拜
拿到譜以後回家慢慢研究 老師帶的速度也很溫和
但是練團的時候 譜發下來兩分鐘 一二三就開始了
沒那麼多時間讓你去找 這裡要彈什麼調 那裡要彈什麼調
我從頭到尾都很緊張 完全怕自己像下午聽到老師說的
"讓團員想把你的音箱往外丟 再把你往外丟"
手忙腳亂的 從頭到尾沒彈出一句像樣的樂句
好笑的是 裡頭除了Keala老師沒一個比我老
樂齡長的(I mean jazz) 十年以上比比皆是
還有人從小學四年級開始玩
練完 走出練團室所在的宿舍
雨一邊飄著 我一邊餘悸猶存地問Keala 我能不能真的加入
"as long as you wanna play:)"
他是真的一個很親切的Bass手
兩天以後 我去買課本 才發現老師 John Thomas
是個鼎鼎有名的傢伙
跟Chet Baker, Dizzy Gillespie 還有很多大咖合作過
他寫的書 也是Berklee書店賣的很好的教科書
我才發現 一堂課兩千塊其實也是還好 對於這種世界級的老師來說
算是撿到便宜
我跟小鄧聊到這個 也都覺得
就當是我沒選他的美美宿舍 選了個比較差的 比較遠的
拿每個月省下來的錢去學吉他
實在也是個划算
之瑜聽到我加入了一個團
(就是那個當年我們在比倒垃圾的時候坐在下面當評審的趙之瑜)
就問我他可不可以也進來玩 當vocal
雖然最後沒幫上忙
但是也從他那邊知道
Keala 也是個很有名的Bass手
他太年輕 年輕到我以為跟我們一樣是學生
沒想到是New School of Music 的教授
人超級nice 彈的不好他都直說沒關係
就是一直笑笑的
今天第二次練團 下午原本要去上吉他課但是老師請假
聽說原來是MIT付錢請Keala來帶這個combo
但是我們沒有學分 也不必付錢 想來的人 只要有空位就可以來
我突然感覺到
幹 MIT學生真的是夠幸福
居然有這麼好的事
這樣的團好像有很多個 不過現在名額差不多都滿了
可以進來 算是我真的賺到
說了這麼多 總而言之
很想把吉他好好彈好
覺得以前好像在學芝麻街美語
很enjoy 但是紮實的功夫下的少
並不能說花了三年什麼都沒學到
老師給了我們很多好觀念 深植了重要的sense
但是我沒想到 來了這裡真的覺得自己會的不多
每次人家問
都完全答不出 我到底會些什麼
搞到最後還是那首 Blue Bossa
這下來了Boston 如果再只是彈enjoy而已
未免太可惜了
Jazz
I love Jazz
延伸閱讀