[DVDFolks] Crew Algorithm (was [ANNOUNCE] 0.8 Preview available)

Doug Wright doug at dougweb.org
Mon Dec 27 15:00:26 PST 2004


Kyle Bushman wrote:

> Sorry I misunderstood what you wrote about what you removed.  In 
> regards to that, I don't know anything about programming but is there 
> a way that you can allow things with a colon to be allowed but then 
> take away everything that has "second unit" in the job description.
>
I'll try and explain in a non-technical way what DVDFolks does/did, and 
if you (or anyone else) can come up with a better way, I'll certainly 
implement it!

The IMDb can (and does) use a wide variety of words and punctuation in 
the role descriptors. DVDFolks (not having human intelligence) must be 
guided with a strict set of rules as to what to include, and what to 
ignore. This is done in 3 stages.

Stage 1: Get rid of as much as possible from the role names to make the 
rest simpler. Things like credit names and episode participation are 
stripped here.

Stage 2: Get rid of 'lesser' roles. If the role includes words like 
'assistant' or 'additional', then this credit is ignored. A list of key 
words is used here, e.g. unit will match 2nd unit/aerial unit/undersea 
unit etc.

Stage 3: Try and match the role text of the remaining crew to DVD 
Profiler's list. e.g. convert 'based on book' to 'writer'. Leave out 
anything that cannot be matched.

The specific case you found was the role 'Executive Producer: New Line 
Cinema'. We don't want line producers included so the word 'line' is 
included in the list used for stage 2. This matched the given role, so 
the credit was ignored. I then amended stage 2 to stop the search 
whenever a colon was encountered, so 'Line Producer' would match and 
'Executive Producer: New Line Cinema' wouldn't because the 'bad' word 
was after the colon. But this then lets 'sound mixer: 2nd unit' type 
roles through, and there are a lot more of them! It's only New Line 
that's a problem, a similar credit using Paramount Pictures would get 
through.

I haven't given up hope yet, I have a couple of half-baked ideas that 
need exploring to check how much they'll slow down the program.

I've also just had an idea that's so simple, there's got to be a catch 
to it somewhere....

-- 
Doug

Email:     doug at domainbelow      Reclaim your inbox!: getthunderbird.com
Web:        www.dougweb.org      Rediscover the web!: getfirefox.com

Users of Outlook (Express) *please* keep my real address out of your 
address book! I'm fed up with spam and virus of the week emails...




More information about the DVDFolks mailing list