[DVDFolks] Crew Algorithm (was [ANNOUNCE] 0.8 Preview available)
Doug Wright
doug at dougweb.org
Mon Dec 27 15:00:26 PST 2004
Kyle Bushman wrote:
> Sorry I misunderstood what you wrote about what you removed. In
> regards to that, I don't know anything about programming but is there
> a way that you can allow things with a colon to be allowed but then
> take away everything that has "second unit" in the job description.
>
I'll try and explain in a non-technical way what DVDFolks does/did, and
if you (or anyone else) can come up with a better way, I'll certainly
implement it!
The IMDb can (and does) use a wide variety of words and punctuation in
the role descriptors. DVDFolks (not having human intelligence) must be
guided with a strict set of rules as to what to include, and what to
ignore. This is done in 3 stages.
Stage 1: Get rid of as much as possible from the role names to make the
rest simpler. Things like credit names and episode participation are
stripped here.
Stage 2: Get rid of 'lesser' roles. If the role includes words like
'assistant' or 'additional', then this credit is ignored. A list of key
words is used here, e.g. unit will match 2nd unit/aerial unit/undersea
unit etc.
Stage 3: Try and match the role text of the remaining crew to DVD
Profiler's list. e.g. convert 'based on book' to 'writer'. Leave out
anything that cannot be matched.
The specific case you found was the role 'Executive Producer: New Line
Cinema'. We don't want line producers included so the word 'line' is
included in the list used for stage 2. This matched the given role, so
the credit was ignored. I then amended stage 2 to stop the search
whenever a colon was encountered, so 'Line Producer' would match and
'Executive Producer: New Line Cinema' wouldn't because the 'bad' word
was after the colon. But this then lets 'sound mixer: 2nd unit' type
roles through, and there are a lot more of them! It's only New Line
that's a problem, a similar credit using Paramount Pictures would get
through.
I haven't given up hope yet, I have a couple of half-baked ideas that
need exploring to check how much they'll slow down the program.
I've also just had an idea that's so simple, there's got to be a catch
to it somewhere....
--
Doug
Email: doug at domainbelow Reclaim your inbox!: getthunderbird.com
Web: www.dougweb.org Rediscover the web!: getfirefox.com
Users of Outlook (Express) *please* keep my real address out of your
address book! I'm fed up with spam and virus of the week emails...
More information about the DVDFolks
mailing list