Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->General: Website Discussion Page: 1... 7 8 9 10 11 ...26  Previous   Next
goodguy's Credit Lookup Plus
Author Message
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,695
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Just a hunch - could it be different name parsing?
There are three possible ways to add "Queen Elizabeth":
Queen Elizabeth//
Queen/Elizabeth/
Queen//Elizabeth
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Just a hunch - could it be different name parsing?
There are three possible ways to add "Queen Elizabeth":
Queen Elizabeth//
Queen/Elizabeth/
Queen//Elizabeth

Yeah, AiAustria told me before and I didn't get it. Now I do. I am matching CLT results now. Now to test and see if that fixes some of the other differences I found.

Problem was that even though I was using the search field firstname to match on creditedAs, the CLT will also match to the first name in the database, if the middle and last names are null. So, I was missing "Queen Elizabeth" as a first name.

So far, my code does not handle the parsing Queen/Elizabeth/. Should it? If so, it means the database is even more messed up than I thought.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
No sooner spoken, than smacked on the head. Just resolved another missing profile, due to the database having a blank firstname and using the middle and lastname for match to "zhang ziyi".

Hopefully now, that gets me there.

At least for the simple stuff. But I have found examples of things like of 2 and 3 word middle names.

So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs.

It might be too late in the game for me to try and provide that parsing, without a rewrite, which I am not motivated to do until / unless I get the required results accuracy, now that I understand how the search need to work. But, the variants can be loaded from a CSV file, so the combinations can be generated either by hand, or some other tool.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile RegistrantStar ContributorAiAustria
Profiling since 2004
Registered: May 19, 2007
Reputation: Highest Rating
Austria Posts: 5,715
Posted:
PM this userDirect link to this postReply with quote
Quoting mediadogg:
Quote:
... So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs....

At least for the comparison it is not necessary to parse. It is easier to concatenate F/M/L to "F M L" and compare this string. That is what the CLT does.

Concerning the Queen Elizabeth Problem: A little bit weird, the following titles can't be downloaded by Add-Multiple-UPC:
8-273770-005144
026359-924620
Although they can be found and selected for download manually...
Complete list of Common Names  •  A good point for starting with Headshots (and v11.1)
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting AiAustria:
Quote:
Quoting mediadogg:
Quote:
... So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs....

At least for the comparison it is not necessary to parse. It is easier to concatenate F/M/L to "F M L" and compare this string. That is what the CLT does.

Concerning the Queen Elizabeth Problem: A little bit weird, the following titles can't be downloaded by Add-Multiple-UPC:
8-273770-005144
026359-924620
Although they can be found and selected for download manually...

Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits. Not much point in further scraping the CLT web pages - what would be the point. We already have CLTPlus.

Due to the inconsistent way people encode the data, you can find the whole name (as "queen elizabeth" ) in any one or split across those fields,  totally dependent on how the user built the profile. The eleven credited profiles for queen elizabeth includes:

- middle empty with "Queen Elizabeth" in credited as
- first empty with queen and elizabeth in the middle and last fields
- middle empty with first as queen and elizabeth as last
- middle and last empty with first as "queen elizabeth"

And I wouldn't be surprised to find other variations. And then there are cases of people putting in honorifics. I have seen a 3 word middle name. The database is a mess.

I wish I had implemented your idea of accepting a string and then constructing a table of all the possible variants for the user to pick from. Didn't get it, sorry. But, I do have the table and the ability to bulk process, and a way to fill the table from a file. So, if you want to make a little script to go from a string to all the the variants, that would work. For now, I am trying to get the thing accurate and stable, and then maybe faster.

I wish I had the energy to start over, knowing what I know now ...
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
If there were an alternative web browser object, it would be so much easier and faster. Anybody find one, please let me know.  Maybe we even could chip in together and buy it?

Hey now! Free and Open Source, WinForms and WPF. Hmm .... look here.

Anybody interested? I'll share all I know to get you a head start on the scraping.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile RegistrantStar ContributorAiAustria
Profiling since 2004
Registered: May 19, 2007
Reputation: Highest Rating
Austria Posts: 5,715
Posted:
PM this userDirect link to this postReply with quote
Quoting mediadogg:
Quote:
Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits.

How do you search the data base?

I thought you have to fetch each profile with its UPC/locality before you can do anything further.

If yes, what prevents you from concatenating F/M/L after fetched the data base record to get a comparable string?
Complete list of Common Names  •  A good point for starting with Headshots (and v11.1)
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting AiAustria:
Quote:
Quoting mediadogg:
Quote:
Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits.

How do you search the data base?

I thought you have to fetch each profile with its UPC/locality before you can do anything further.

If yes, what prevents you from concatenating F/M/L after fetched the data base record to get a comparable string?

One more time ... because there is no string in the database to compare with. Simple as that. To understand better what I am saying, look at the XML for any profile. How would you use a concatenated string to pull credits from the XML? (If you show me something I don't know, I will quickly and happily use it. Old dogg can learn new tricks.)

The XML somewhat reflects the database structure. A plugin makes calls to the database, and get back either the XML, or program objects that have pretty much the same struture:

      <Credit FirstName="Alexander" MiddleName="" LastName="Payne" BirthYear="0" CreditType="Direction" CreditSubtype="Director" CreditedAs=""/>
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile RegistrantStar ContributorAiAustria
Profiling since 2004
Registered: May 19, 2007
Reputation: Highest Rating
Austria Posts: 5,715
Posted:
PM this userDirect link to this postReply with quote
Quoting AiAustria:
Quote:
How do you search the data base?

Maybe I am able to understand, if you can share the relevant part of the code... - ?
Complete list of Common Names  •  A good point for starting with Headshots (and v11.1)
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting AiAustria:
Quote:
Quoting AiAustria:
Quote:
How do you search the data base?

Maybe I am able to understand, if you can share the relevant part of the code... - ?

I'll send you a PM. I am still honoring my contract with Invelos to not reveal the plugin API in public, although I'm not sure it matters these days.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Oh wait, I think I read your post backwards. What you are suggesting is in fact what I am doing. Maybe what I should also do is get rid of the variants table and just give a single line text field and leave it at that? (of course I would also compare with creditAs, along with concatenated F/M/L). I'll try that. No more variants table?
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Ok, that did help by motivating me to simplify my code mish-mash. Now all the cases are covered, and it might be 1 nanosecond faster. Three brains are better than one.  And no need to throw away the variant table, since each row gets concatenated, you can still put in variants any way you want.
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting mediadogg:
Quote:
If there were an alternative web browser object, it would be so much easier and faster. Anybody find one, please let me know.  Maybe we even could chip in together and buy it?

Hey now! Free and Open Source, WinForms and WPF. Hmm .... look here.

Anybody interested? I'll share all I know to get you a head start on the scraping.

If that thing is as advertised, once could replace CLTPlus without a plugin, using WinForms or WPF and it would be fast and avoid IE. And if it produced XML that incuded the profile IDs, you could even run it through a small plugin that would give you the complete online XML to play with, also fast. I'm just sayin ... 
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
 Last edited: by mediadogg
DVD Profiler Unlimited RegistrantStar ContributorGSyren
Profiling since 2001
Registered: March 14, 2007
Reputation: Highest Rating
Sweden Posts: 4,695
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting mediadogg:
Quote:
- first empty with queen and elizabeth in the middle and last fields

Hm, you're not supposed to be able to add cast/crew with an empty first name field, so I wonder how that got there?
My freeware tools for DVD Profiler users.
Gunnar
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Quoting GSyren:
Quote:
Quoting mediadogg:
Quote:
- first empty with queen and elizabeth in the middle and last fields

Hm, you're not supposed to be able to add cast/crew with an empty first name field, so I wonder how that got there?

Dunno, but as we have together found things like the uncredited entry in a different locality, and printer control characters in the Overview, people don't always follow the rules!

Here is the profile: 4895024927206.21

    <Actor FirstName="" MiddleName="Zhang" LastName="Ziyi" BirthYear="0" Role="Hu Li" CreditedAs="" Voice="false" Uncredited="false" Puppeteer="false"/>
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
DVD Profiler Desktop and Mobile Registrantmediadogg
Aim high. Ride the wind.
Registered: March 18, 2007
Reputation: Highest Rating
United States Posts: 6,463
Posted:
PM this userVisit this user's homepageDirect link to this postReply with quote
Couple of other matching tips that most programmers have run across, but a reminder:

- use .Trim() to remove leading and trailing blanks
- use .ToLower()  (or ToUpper()) to remove case sensitivity
- convert double blanks ("  ") to single blanks(" ") to account for sloppy typing inside a field
Thanks for your support.
Free Plugins available here.
Advanced plugins available here.
Hey, new product!!! BDPFrog.
    Invelos Forums->General: Website Discussion Page: 1... 7 8 9 10 11 ...26  Previous   Next