Friday, November 18, 2005

Improvisation & Computing Technology

I read Paul Nemirovsky's "Improvisational Media Space:: Architecture and Strategies for Evolution" tonight. It mentioned about how people do improvisation as well as how computing machines may be able to do it, for the purpose of providing human users inspirations during media creating processes. The term "media" right here can refer to text, music, audio, video, motion, and so on, while the author focuses on music improvisation here in this paper.

This piece of article is more than impressing to me. Truely, I was excited. All of a sudden I felt that my eyes are opened to something that is really interesting to me. And it appeared that something I really LOVE, in terms of academic research, would definitely be found, after all the years searching my real interests in the computing technology/science research playground. Improvisation to me is the process of making ideas flowing out from one's body, naturally, fluently, and neverendingly. The researchers working on signal-processing algorithms for digital music have done lots of efforts and surely some contribution, since those are the basics of all musical processing technology. But what is lacked in their work to me is the spirit of music. Musical technology is not only about making sound or identifying it. Recognizing and making use of music's characteristics of moving and touching people during the creating and performing process is something more important, in my personal point of view. Benefiting people in freeing them from struggling during the creation/performance and providing much easier ways to go, is something people need but not many solutions being provided for. I'm glad to see some work pursuing this direction. There are so many sentences in this paper speaking out my true belief in improvisation ( well, actually shared by the majority of musical improvisers), and it's the Media Lab's peculilarity here for me to make things that involve both humanity and engineering heavily. This is also the Media Lab's responsibility to the world in my perspective as well.

延伸閱讀

Jazz Lessons

三個禮拜之前 John對我說
"You are promising."
因為很多Berklee的學生都沒有這樣的天份
很多人的手都像大猩猩的手 他們都搞不清楚自己的手要怎麼控制
"You got the brain, and you got the hands" 他說

然而 接下來卻是
同樣的一首歌 我練了好久好久
手指怎樣就是沒辦法按好那些和絃 搞得連續三個禮拜都沒教新東西
真的沒料到 原來還有這麼這麼多和絃
是我的手按不來的
一直到前天 好不容易可以跟上老師的速度 彈完整首歌了
被John說 我的手慢慢在develop the right muscles
才覺得稍稍有了成就感
他說這首Dear Old Stockmarket是整本書裡面幾乎最簡單的一首
我在三個禮拜之內也真的有進步 感覺的出來自己彈的聲音有好聽些
很好奇接下來會怎樣發展

延伸閱讀

Teachers

Keala 在練團中 為了解釋給我聽 在演奏soy kalifa的時候
必須小心地和其他樂器對話 (尤其是鼓,因為bebop的節奏非常特殊)
就拿了一個小鼓和鼓棒 要我在大家進行的途中和鼓手"講話"
練習了幾次以後 也讓其他人輪流玩這個遊戲
他說 "All instruments are basically drums"
雖然這道理老早懂了
怎樣也比不過這個遊戲來得印象深刻

John為了要解釋 我的手在張得很開很開 彈困難的和絃的時候
食指不能因為小指的動作受到任何影響
要死命粘在指板上一模一樣的位置
他要我站起來 站住死不動 然後他用手推我的腿
之後反過來要我推他
然後告訴我
"It's your fingers. You can control it.
You want it to move, it moves.
If you want it to hold on to it,
you can definitely make it too."

Henry的口頭禪 有很多很明顯容易記得的
除了那些 "You know...you know that um...you know..."之類沒特別意義的
Hugo都會說Henry常提到
"Computers are too dumb, because they don't have commonsense!"
"What would be the scenario?"
"You wanna nail down the scenario"
"You wanna focus on the commonsense tools, play with it,
and don't spend too much time digging into OOXX"
給我很大的自由 是我最喜歡Henry的地方
雖然我們不是非常聊得來 很多時候他覺得好笑的東西我不覺得有趣
眼光很寬廣 很遠 敢作別人不敢作的 還有專注 卻是something I really admire

Glorianna 對拍電影 對說故事的passion
總是源源不絕地 從談話之間不經意的微笑中流露出來
我愛極了每個禮拜的面對面聊天
從半個鐘頭 到變成了將近兩個鐘頭
從正式的報告進度 到今天
真的聊到 what I disagree with the media lab about
where my passion lies in,
what I wish to find right here during the two years
上禮拜的見面 講到他們正在做的網站 是她的vision的一部分
能讓全世界的家庭破裂的人 悲傷的人 失落的人
透過video分享彼此的生命經驗 得到安慰 甚至力量
其中reasoning的部分 正是我這學期在work on的project
聽到的時候 赫然發現 原來我在做的 其實是很有意義的事
不同於很多這裡或其他地方看得到的research的無關痛癢
今天 又提到的是 另外一部份vision
那個故事是 她用照片 用文字 做給兒子的一本書as birthday present
主題是他兒子長時間以來努力企圖解答的問題: "What matters?"
Somehow,
Somehow I might find a way out to work on
how people live and think about their own lives
which I'm deeply engaged in
一個學期下來 我發現的是
stories are not simply stories for recreation
they could mean a lot to us and our lives
幫助人們講他們自己的故事 聽別人的故事
是我在這裡能做的 對人們有意義的事

Cindy講戲 講人 讓我呆住好久
原因是我發現 原來我一直像是死的
活著的人是死著 或許不是那麼令人覺得意外
死去的人始終活著 卻讓人很impress
一個人怎麼能給周圍的人那樣大的力量
大到多年以後 一個完全無關的人聽到了 一樣會被撼動
I once had passion, but it disappeared without making a sound as time passees by
我在想 不能再讓自己這樣下去
不虛此行的那段美麗過去 該變成不虛此行的人生
否則不如不活

老師 技藝的老師 人生的老師

我該是真的幸運

延伸閱讀

Thursday, November 03, 2005

Problem Defined - Video Sequencing Weekly Report Nov 2

This week, I’ve been trying to take look at more of the stories, the annotations, to come up with some better approach of applying commonsense technique into this problem domain, and also to do some little experiment with ConceptNet. We sort of coming up with a new formalization of the problem and scenario during the discussion, and the picture for the whole thing as well as the next step are drawn pretty clearly.

Looking at the Stories and the Annotations

I took a look at both sets of text, and I have found something interesting during the observation, listed below:

  • Synonyms – Similar while different terms tend to appear in different passages for describing the similar meanings, e.g., “creating”, “inventing”, “making”... and so on. It is important for the system to recognize that they convey the same idea if we wish to correlate the annotations with the stories
  • Related concepts – Sometimes the above situation results not from the synonyms but the related things under certain context. For example, "John invented the dance" and "he created new steps" should point to the same event with higher possibility, as opposed to “John invented the dance” and "John invented the animation"
  • Proper names – The single name lying in different pieces of annotation, e.g., "Victoria", actually refers to the same person and suggests a lot of hints or clues of causality. Identifying these names is something extremely important in the process of finding the relationships among stories and annotations as well
  • Despite that the events stored in StoryNet tends to be very low-level, I think it somehow has the potential of being a useful tool. The newer version of StoryNet that Dustin Smith and other people in Commonsense Computing Group are pursuing is constructed with the attempt of showing all the possible steps for a given goal. It should be useful for filling up the semantic gap where things are not explicitly told in the stories or annotations. This can be considered as a future step.

Finding the concept flow within the story

And then I tried to find the proper clips for a story manually, such that I can understand more about the process how human selects the useful materials over others.

  • Characters – Something interesting is that, almost all the stories written are going along with some particular character, so finding all the annotations concerning about this particular character would be the first step of the system’s process
  • Time – Because all stories are about time, and different pars of a story tend to be different sub-events happening at different time period, trying to divide the story into parts using time information might be the second step after identifying the character(s).
  • Extracting the “story descriptors” from the story (e.g., characters, desires/goals, problems, does, emotions, results). I was actually not sure whether it is necessary for the system to try to sense these “descriptors” existing in the passages. It would be helpful in terms of focusing on the few things and filtering out irrelevant utterances, but it would be harmful at the same time if the choice of descriptors is not good
  • Labeling the key ideas in the text. The key ideas right here, in my personal point of view, are actually the values for the descriptors (for example, for the goal descriptor, the sentence “Gustave likes to create his own dance” has the key idea of “create his own dance”) If we don’t do the step of extracting the story descriptors, then the key ideas could then be arbitrary verbs with certain objects that the verb acts on. (Since, again, the function of the story descriptors is to narrow down the semantic processing procedure into the examination of some specific set of values.)
  • Finding related annotations to the stories according to the labeled ideas
  • Scope – This is actually the semantic matching process of two sentences' parts But it's not reasonable for the system to take into account all the words in both sentences when computing the correlation, since they shouldn't be descriptions of events of a same scope. The clip annotations tend to be subsets of the stories, so it makes more sense for us to claim one annotation is very much related to a story even if they share only one concepts. So right now the approach I came up with is the so-called "in-other-words" approach. That is, rewrite all the sentences in both the stories and annotations into as many distinctive sentences as possible, and match the sentences from both sets.

Right now, I don't think it makes much sense anymore for the system to link annotations using the context constructed by the stories deliberately. The context naturally comes as the story is inputted, and the selected annotations will naturally contextually related as well if we select the annotations according to the sentences in the story. So here the problem would thus become how to relate the annotations with the stories by listing the "in-other-words" sentences, and we don't need to bother the relationships among clips anymore. The complexity of our task of building this system is therefore somewhat lowered, but I think it is actually a good interaction design, meanwhile a more reasonable problem domain for commonsense computing to fits in.

All the problems about deciding which is the next clip are eliminated via the approach of having users input their own story for the video to be generated. Because I don’t need to find the causal relationships among the clips, nor any other relationships. In other words, the task of constructing the narrative has been avoided. The subtask left becomes simply to compare two text passages – whether one comprises the other, or whether they refer to the same fact.

Once this stage can be carried out by commonsense reasoning technique, we’ll be able to proceed to the next stage – creating a narrative machine that gives us a story based on the material it has. For example, if the system wants to focus on emotional transitions, then clips about the same characters/group experiencing different emotions can be selected to form a narrative. This is going to be a higher-level look at these annotation things.

Story:

Gustave is one of the dancers. He likes to suggest ideas for the dance. He gets very frustrated that he is only allowed to do what he is told. At one point he was crying. This happened during the first week. The second week was different. Dancers were encouraged to design their own dance. Gustave had a wonderful time.

Annotations:

(O) Gustave wants to be a dancer instead of orchestra in "RoBallet," so he invents some new moves. He shows them to Jacques, who likes how Gustave changes his level throughout the dance, which is something no one else has done.

(O) Gustave is upset because he wants to experiment with presenting the dances in different ways, but so far Jacques has been telling the kids what to do. Gustave just wants the chance to make something up himself.

(?) Louis begins his animation. Gustave gives him tips on how to create patterns, but ends up making the wrong design. After suffering a minor setback, Louis starts over and does it his way.

(?) The kids look for Gustave, who has been behind the screen working on animation. They want to include him in their dance.

(?) Gustave's mother watches the children perform the Finale, and she notices that they have made much progress.

(?) Jacques explains to the kids how their animation will be displayed on the screen during the dances. Gustave suggests that Mason and Tiffany perform "Pas de Deux" behind the screen, because the silhouettes will look interesting.

(?)Victoria, Louis, Cristen, and Gustave practice "The Curtain." Dufftin reminds them to show their claws!

(?) After lunch, the kids design and program animation that will be used in the performance. Here, Louis has an inspired vision of what he wants to create. He explains his ideas to Anindita and Gustave.

(?) Dufftin and Gustave's mother discuss teaching techniques and what works best for children. The consensus? Teaching recovery is most important.

(?) Director Henri would like to thank everyone involved in the creation of "RoBallet." Although the dance was inspired by Louis, Jacques reminds them that Gustave had the idea of making up their own moves first.

(X) Gustave, Cristen, Louis and Victoria practice "Curtain" a few times. Will the sensors work for the performance?

(X) Jacques instructs Gustave to look directly into the camera as he dances so the audience can get a good look at his face.

(X) Gustave has been given a part in the orchestra. With the cast complete, the kids are about to perform with him for the first time. Victoria adds Gustave to her introduction, but Henri says she should mention him with the dancers. Victoria complains that Henri should correct her after the show, and not embarass her in the middle of it.

Extracting the Key ideas

Gustave is one of the dancers. He likes to suggest ideas for the dance. He gets very frustrated that he is only allowed to do what he is told. At one point he was crying. This happened during the first week. The second week was different. Dancers were encouraged to design their own dance. Gustave had a wonderful time.

He likes to suggest ideas for the dance

  • Gustave likes/wants to create
  • Gustave likes/wants to discuss
  • Gustave likes/wants to imagine
  • Gustave likes/wants to try different poses
  • Gustave likes/wants to move his body

He gets very frustrated that he is only allowed to do what he is told

  • Gustave feels bad to be restricted
  • Gustave wants to quit when being forced

At one point he was crying.

  • Gustave feels bad, unhappy, disappointed, frustrated

The second week was different.

  • Gustave’s not feeling bad, unhappy, etc
  • Gustave feels better

Dancers were encouraged to design their own dance.

  • Gustave/Dancers having their own design
  • Gustave/Dancers can use their imaginations
  • Gustave/Dancers are not restricted
  • Gustave/Dancers are praised

Conclusion: Definition of the Problem and the Scenario

  • Only one character is considered during the process of the story.
  • Having users input structuralized stories: character, goals/desires/problems, struggles, results
  • All the characters are described in a list with their personality and characteristics such that the system can disambiguate the one referred to in the story
  • Reduced from the task of first creating the narrative of the story and then choosing the right clips to sequence, into the subtask of selecting the right annotated video clip according to the mapping of the semantics from input stories to the annotations
  • So thus, the input would be
    1. a structuralized text story provided by anyone who joined the Roballet event
    2. a set of video clips with arbitrary text annotations
    3. a list of descriptions of the characters that appear in any of the video clips
  • The system would comprise
    1. a website for gathering the stories from the people who participate the event
    2. a natural language processor, or a modified version of Montilingua, that processes the stories as well as the annotations to identify the characters, the actions they take, the emotion they exhibit, and so on
    3. a related-concept provider, which may include ConceptNet, WordNet, as well as any other tool for finding possible synonyms and concepts
      (Note that ConceptNet and WordNet have different capabilities and can be used to find different information. For the example "create," WordNet can be used to find the synonym "invent," whereas ConceptNet can be used to find the related concept "imagine", which shares not directly the similar meanings but still highly related concept)
    4. a central module that takes care of the whole data flow in the system, matches the concepts/words derived from the stories and the annotations, and finally prompt the final results


延伸閱讀