Thursday, October 06, 2005

Video Sequencing Weekly Report Oct 5

What I have done this week:

1. Watching clips of Roballet as many as I can

2. Re-reading Barbara’s thesis

3. Formulating the system to be built

Ideas:
1. Roballet

Annotations are higher level than the actual actions/words in the conversation:

Describing the scene instead of the keeping the words literally

Example 1: “He is joking” instead of “I’ll giving you one dollar”

Example 2: “He welcomes the kids” instead of “…be responsible and respectful”

Some times such high level annotations can serve as conclusions, deriving from not only the annotated clip but also many else, which may be highly subjective.

Example: “…, which is something no one else does before”

In this example, the meaning of the annotated words was actually never shown explicitly from the clip, but was told by the annotator that tells it based on his/her judgment.

This is related to another issue: subjectivity of the annotation. The annotator’s subjectivity may influence how the edited/suggested sequence ends up with.

Example: “They also create a fun pose for everyone…” v.s. “Henri directs the group to create a fun pose for everyone”

There are some other times that annotations might not be appropriate enough

The annotation only focuses on one particular (maybe even minor, in the sense of time span) event that other parts might be neglected.

The words are abbreviated/inaccurate, e.g., “Christen can do it!” instead of “Christen thinks she can do it!”

The annotator directly types the words in the conversation in the video, instead of annotating“someone is saying that…” as a viewer

As annotations grows more and more with more clips, they get more and more abbreviated, and the data would be more and more insufficient. So we probably need a hierarchical knowledge representation that captures what is really going on along the progress of time in different levels, such that the system can leverage the higher-level knowledge to interpret/use the clips when the information carried by individual annotations is poor.









Would it be necessary for users to input the summarized description for a set of clips or a set of events? This is a question to be answered.

Sometimes the annotator mentions how things evolve, but not how they end up with, which cause data insufficiency.

Example: “but are forgetting one minor detail…”

The title of the clip provides important information.

Example 1: “Where’s Gustave?” suggests that Gustave may be the protagonist in this clip, and that people are probably looking for him.

Example 2: “A Director’s Thank” may be used to infer that either Henri, Jacques, or other people that could be the director would be the main character here. And the director is presumably appreciating someone else(s).

Irony/Jokes/Sarcasm might not be proper to be existent in the annotations.

Example: “The kids sure have learned!” does not show what it means explicitly.


2. Barbara’s Thesis

The creative need to see multiple story possibilities in real time during editing drives the development of a technology that can think with the videographer about the many stories that exist in her video collection.

In Barbara’s initial attempt, providing formal suggestions didn’t work well because “hard-coded rules that direct a videographer to take an establishing shot…enforce a particular narrative filmmaking style”, while users’ decisions of making a shot actually depend on the content and the context, which may include a lot of subjective information. In my case, however, I don’t think the way I’m using ConceptNet or other commonsense tools to build the story narrative would be too far from users’ idea of narrative, because the annotations are themselves subjective already. Suggested sequencings are generated depending on what users annotate the clips and how they annotate them.

“The novice videographers said they would like to use the ConceptNet suggestions in a shoot over the course of a day instead of in such a constrained shoot with and assigned subject.” I think ConceptNet is a more suitable tool for relating concepts of different events in a broader sense, which makes it something helpful in working on the story level instead of the levels of clips or raw footages. As for StoryNet, “[it] could help because … [it can] predict not only the next possible event by many event steps into the future.” StoryNet might be put into use if what it captures is higher level information – at least in my current point of view. If the steps it suggests are too low-level, then I would be skeptical about its helpfulness.

Questions:

To what degree should I take into account full story understanding? Or do I only need to focus on each of the steps in the story plot?

In Barbara’s work, “inference at each step is not influenced by all previous annotations.” Would that be suitable for my system? What are the reasons for following (or not) this kind of rule?

About the representation of the Story, I haven’t check Mueller’s notion on goals, plans, themes, spaces, and time in the book Story Understanding. How much do I need from the existing methods coping with computational documentary challenges, such as representing videos, creating continuity, structuring the story?

“Brooks’ word” example. Do we need a better StoryNet? Or what we need is simply a new way of using ConceptNet?

The System























1. Filter out the stop words

2. Reinforce the influences of former events by traversing its concepts in a decayed fashion.

3. Build the relationship matrix. For each of the (event, event) pair, find a correlation value.

4. Determine the legible “conceptually related” event pairs using a threshold value

0 Comments:

Post a Comment

<< Home