Duplicated locations within a POI file

 

Several times now I've seen my GPS report multiple instances of a POI at almost the same location. This seems to be an error, but in some cases I'm having trouble figuring out how to report it. In my current case I'm looking at 2 Wendy's restaurant locations on the same intersection. Unfortunately, while the person or people who put this together wasn't completely consistent in naming the stores, some are "Wendy's", some are "Wendys" and some are even "Wendys Restaurant", in this case both locations are both showing as "Wendy's" and my nuvi doesn't even give me their lat and long values.

So how can I report such a duplication to try to have it corrected if the location name is useless for identifying an individual location? Is there any easy way to find these lines in the original POI file, realizing that they may not even be adjacent in the file? And does anyone even care and want the data cleaned up, or am I just sounding like a whiner?

Sort by Lat and Long columns

Load the POI file into Excel and sort by column 1 and column 2. That will get them grouped by location. You will easily be able to see if you have duplicate or even almost duplicate entries that way.

Duplicated locations within a POI file

If you have Excel, load the CSV file and sort by longitude. That should put any duplicates close enough together to make them easy to find. Delete the duplicates and save the file.

Johnc's reply wasn't posted yet when I sent mine. I guess we think along the same lines. smile

--
Anytime you have a 50-50 chance of getting something right, there's a 90% probability you'll get it wrong.

POI Verifier

In addition to Excel, I use POI Verifier II which identifies duplicates. In fact, you can even specify "how close" to consider a duplicate.

http://showcase.netins.net/web/notchfamily

Tim

Duplicates

Maybe it's not a duplicate in the file. Maybe it's one in the built-in database, and one in your custom file.

--
Frank DriveSmart55 37.322760, -79.511267

.

Frovingslosh wrote:

Several times now I've seen my GPS report multiple instances of a POI at almost the same location. This seems to be an error, but in some cases I'm having trouble figuring out how to report it. In my current case I'm looking at 2 Wendy's restaurant locations on the same intersection.

Contact the author to get the file corrected.

If it's the Wendy's POI file (Wendys_Restaurants.csv) from here, contact scottk:
http://www.poi-factory.com/user/50342/contact

--
Nüvi 2595LMT

not a duplicate with a built-in POI, that may be there also

phranc wrote:

Maybe it's not a duplicate in the file. Maybe it's one in the built-in database, and one in your custom file.

No, it's duplicated. I can see both locations in the custom POIs when I pull them up. Same distance from me, same direction, same intersection when I tap on "map" view from inside the custom POI list. But the name is always Wendys and I can't easily get the exact coordinates of the POIs when I pull them up this way, so it makes finding the entries in the file a pain.

I will try the spread sheet approach. I do have an old copy of Excel, but it is on an older computer, not one that I use for my GPS work. I expect that this may be what pushes me to finally get around and install Open Office on a few of my newer machines (I really think this site should start recommending Open Office rather than Excel, or at least give them equal mention rather than just mentioning the overpriced Excel). It still could be a pain though depending on how far apart the two first cordinates are. remember, you're looking at a stripe of area that runs all the way across the country, and there may be other matching POIs in there somewhere, even if on the other side of the country.

This is certainly one reason why we need some POI standards, and I think one of them should be that all POIs in a large set don't get the same name, there should be some way to distinguish them. I would suggest city or community after the name, but even company store numbers would be better than nothing.

I'll post back the results when I find the match.

thanks

tupdegrove wrote:

In addition to Excel, I use POI Verifier II which identifies duplicates. In fact, you can even specify "how close" to consider a duplicate.

Thanks, I'll give this a look. Seems like the kind of thing we could use.

but what do I tell him?

WAASup wrote:

Contact the author to get the file corrected.

Yea, I saw the name on the data in the POI section. But I feel that I need to give him something more specific than "you have duplicate POIs in there". Even telling him the nearest intersection isn't going to be much help, particularly if (as likely) he doesn't happen to live in the area. So I'm trying to figure out how to pass back good information when all the GPS displays for this intersection is two POIs both named only "Wendys" at approximately the same location.

I have a couple of approaches to work on now, and one may even be a very handy tool that makes it much less tedious. But I see this as a problem that is likely to come up again and again, and it is good to have some discussion about it.

.

What is the intersection?

--
Frank DriveSmart55 37.322760, -79.511267

lat 35.912487° long -78.781984°

This particular one is at the Intersection of Glenwood Ave and Brier Creek Parkway, Raleigh NC. Google Earth coordinates given above. The only Wendys there is in the west corner of the intersection.

I have not gotten to the application that I was pointed to yet. Very windy here today and we had a power outage this afternoon, I'm just getting power back and restarting systems. It is on the to-do list though.

If you do beat me to this one and let the maintainer know, please post back so I don't duplicate the effort and bother him. But I'm glad I asked, was pointed to what looks like it should be a very handy tool.

no sale

tupdegrove wrote:

In addition to Excel, I use POI Verifier II which identifies duplicates. In fact, you can even specify "how close" to consider a duplicate.

http://showcase.netins.net/web/notchfamily

Tim

Well, looks like that is out of the question.
He wants paid up front for a serial number before you can try it. He built the damn thing to require MS Net2 (which I so far have successfully resisted). He warn you to set you browser privacy "low" (sounds fishy). And he will only deal with PayPal, which I absolutely never ever will.

Perhaps POIFactory should own a copy of this and use it to verify all of their POI files, although even for that I would never advocate making any kind of transaction through PayPal. However, I was just trying to help POIFactory track down a duplication. I have no real incentive to chance buying this guy's software to do that.

I'm off to get the Open Office alternative to Excel to take that approach.

Let me know

Frovingslosh wrote:
tupdegrove wrote:

In addition to Excel, I use POI Verifier II which identifies duplicates. In fact, you can even specify "how close" to consider a duplicate.

http://showcase.netins.net/web/notchfamily

Tim

Well, looks like that is out of the question.
He wants paid up front for a serial number before you can try it. He built the damn thing to require MS Net2 (which I so far have successfully resisted). He warn you to set you browser privacy "low" (sounds fishy). And he will only deal with PayPal, which I absolutely never ever will.

Perhaps POIFactory should own a copy of this and use it to verify all of their POI files, although even for that I would never advocate making any kind of transaction through PayPal. However, I was just trying to help POIFactory track down a duplication. I have no real incentive to chance buying this guy's software to do that.

I'm off to get the Open Office alternative to Excel to take that approach.

Every file that comes through the moderation queue has been run through POI Verifier, and POI Factory uses PayPal for it's check out procedure.
The creator of POI Verifier is a very well respected member of our community. You also get at least 2 or 3 opportunities to use it and test it before the serial number is needed to unlock the full program.
I do see that this file has quite a few duplicate locations in it. It will take some time to clean it up. I will take some time and work on it tomorrow.
On files of this size it is good to change the variance so that more duplicates show up that might be a few feet away but reflective of the same building.

Miss POI

I own POI Verifier

and can absolutely say that dealing with RT, the developer of POI Verifier, has been nothing but POSITIVE. Several times I (me not the program) had problems and RT assisted me in solving them !
I wouldn't put a file, even the ones I write, into my GPSr's without running POI Verifier.
When I was writing the Rest Areas Combined POI file I must say that POI Verifier was invaluable in finding MY errors. Every update HAS to go thru POI Verifier before I will upload it.
I have also used it to assist other POI submitters in finding duplicates and other errors in their files.

Nothing but positives from this user !

--
MrKenFL- "Money can't buy you happiness .. But it does bring you a more pleasant form of misery." NUVI 260, Nuvi 1490LMT & Nuvi 2595LMT all with 2014.4 maps !

I 2nd MrKenFl's experience...

Nothing but positives to say about POI Verifier II and before that POI Verifier... I think we all owe retired technician a vote of thanks, not a bunch of complaining about a very reasonable cost...

--
It is terrible to speak well and be wrong. -Sophocles snɥɔnıɥdoɐ aka ʎɹɐƃ

spreadsheet followup

miss poi wrote:

Every file that comes through the moderation queue has been run through POI Verifier, and POI Factory uses PayPal for it's check out procedure.
The creator of POI Verifier is a very well respected member of our community. You also get at least 2 or 3 opportunities to use it and test it before the serial number is needed to unlock the full program.

Oh, I mean no disrespect to him at all and have no resentment for him. I just had thought when Tim told me to download and use the verifier program to check the POI file for duplicates that he didn't mention he expected me to buy the program.

I did read through the instructions, saw no mention of a trial run, so I just decided not to take this approach. The author has every right to sell his software, I just wasn't looking to buy something to try to help track down these duplicates.

I'm not crazy about his choices to build upon NET2, to ask you to lower your browser privacy setting, or to only use PayPal, but these are choices he is free to make. I don't particularly like seeing paypal get more and more invasive in Internet transaction to the exclusion of all else, so I'm quite willing to tell any author or vendor who does that that he may be cutting out some customers by making this exclusive choice or a choice that favors paypal at the expense of other options.

miss poi wrote:

I do see that this file has quite a few duplicate locations in it. It will take some time to clean it up. I will take some time and work on it tomorrow.
On files of this size it is good to change the variance so that more duplicates show up that might be a few feet away but reflective of the same building.

Miss POI

Yea, I got OpenOffice and loaded the CSV to a spreadsheet, sorted it, and was amazed to see how many very obvious duplicates there are even looking at the data that crude way. Since you have the better tool I'll leave it to you to try to do some cleanup work,

I'm glad to report that OpenOffice did a great job of opening the csv file. About the only thing I had to do that I didn't expect was increase the displayed accuracy of the coordinates.

I suggest that rather than tell users to open a csv file with Excel, members here just use the term "spreadsheet", or mention the options Excel and Open Office, or even just suggest Open Office rather than the much more expensive Microsoft software.

If you are using windows...

You can also open it with notepad...

--
It is terrible to speak well and be wrong. -Sophocles snɥɔnıɥdoɐ aka ʎɹɐƃ

Raleigh Wendy's Duplicates

Lines 4856 and 4857 in Wendys_Restaurants.csv.

8000 Pooler Rd, Raleigh, NC - (919) 293-1814
9000 Pooler Rd-Raleigh-NC-(919) 293-1814

According to the Wendy's store locator, 9000 Pooler is the correct one.

I've notified the file's owner.

--
Nüvi 2595LMT

but that takes us back to the original question

aophiuchus wrote:

You can also open it with notepad...

Sure, or any text editor. but that takes us back to the original question, how do I find two duplicate POIs that are close together but not exact matches, when the store names are all the same and I don't have exact coordinates for the POIs, so that I can accurately report them to the file maintainer. Using a spread sheet was suggested because it gives you an easy way to sort the list by the coordinate fields, which should at least get the duplicate POIs close together. Unfortunately, they are not always adjacent, since whatever order you sort the coordinates by you end up searching through a narrow stripe that extends either horizontally or vertically through the whole country.

The verifier software is a great approach, since it computes a distance between all of the POIs and can find ones that are closer than a given threshold. From there human inspection should be able to determine if the POI is duplicated or if, for example, there really are Starbucks on two corners of the intersection. But as I was just looking to find one error in someone else's file I wasn't looking to buy software to complete that task, thus the spreadsheet approach. The points could be widely separated in a text file, so notepad doesn't do much to help you find close but non-exact duplicates.

Debating whether to respond.

Not wanting to start an argument, I debated whether to respond; but thought he had a right to know the answers to his questions:

Frovingslosh wrote:

I just had thought when Tim told me to download and use the verifier program to check the POI file for duplicates that he didn't mention he expected me to buy the program.

In effect, you're not paying for the program, only the cost of webhosting and domain name charges. Presently, it isn't even covering these expenses as many here already have the program.

Frovingslosh wrote:

I just wasn't looking to buy something to try to help track down these duplicates.

It does far more than find duplicates.
http://www.poi-factory.com/node/16674
In addition, unless one is a programmer, you don't realize the time spent writing a program, not to mention the cost of the programs we have to purchase to accomplish this.

Frovingslosh wrote:

I'm not crazy about his choices to build upon NET2,

It basically isn't much of a choice. Many programs require Microsoft.Net Framework in one form or another, including symantec's backup program and Microsoft's Programming Software. I don't know what you have against Net Framework ... it meets your requirement of being 'FREE'.

Frovingslosh wrote:

to ask you to lower your browser privacy setting, or to only use PayPal, but these are choices he is free to make.

I elected to go through PayPal for many reasons, one of which you seem to be concerned about ... I don't have to handle any credit card numbers or any personal information, just the way I want it. As far as lowering your Privacy settings, this is required for PayPal to get your information. Also, as stated in the website, you don't have to pay with PayPal, you can pay with credit cards going through PayPal. It doesn't cost you a penny, I end up paying for that service on my end. I have NEVER had a problem with PayPal. They get a cut out of each transaction; but it's well worth it to me to have them handle all personal information.

RT

--
"Internet: As Yogi Berra would say, "Don't believe 90% of what you read, and verify the other half."

no argument here

retiredtechnician wrote:

Not wanting to start an argument, I debated whether to respond; but thought he had a right to know the answers to his questions:

Thanks. I hope that you appreciate that I'm not trying to start an argument either. I was just pointed to the program and told to download it, no mention that it had to be licensed, said that I would try it, and made a choice not to buy it when I found the licensing issue. I have no dispute that you have every right in the world to sell your software, and I don't expect that you have any dispute when someone chooses not to purchase it.

retiredtechnician wrote:

In effect, you're not paying for the program, only the cost of webhosting and domain name charges. Presently, it isn't even covering these expenses as many here already have the program.

Again, no question as to your right to sell a program and even to make a profit from it. If you're taking a lost I expect you could find other hosting alternatives, but that's your choice. I'm not some communist that thinks all software has to be free, and I've even sent in "donations" for free software when I was particularly impressed with it. I just wasn't looking to buy something for this task to help someone else find their mistake. I apologize if my remarks came across any other way.

retiredtechnician wrote:

In addition, unless one is a programmer, you don't realize the time spent writing a program, not to mention the cost of the programs we have to purchase to accomplish this.

Actually, I worked as a programmer for a number of years. I completely understand that you're not getting rich off of this and what effort likely went into it, and I did see from the site what other things it said it would do. I just wasn't looking for any of that, had one task in mind and decided to pass when I got to the licensing point. No one was wrong here.

retiredtechnician wrote:

It basically isn't much of a choice. Many programs require Microsoft.Net Framework in one form or another, including symantec's backup program and Microsoft's Programming Software. I don't know what you have against Net Framework ... it meets your requirement of being 'FREE'.

I'm very uncomfortable with many of the things Microsoft does. As far as I can see the NET products seem to be designed to lock things much more closely to the way MS want things to work. And I don't trust that they at all have the customers interests or security in mind. Plenty of good code runs just fine without needing .NET. I don't know exactly what convinced you that you needed to use it, but again neither of us want an argument. You choose to use it. that is completely your right. I choose to avoid programs that try to convince me to install it, that would seem to be my right. I don't know why you want to state that I have any requirement that a program be free, just because I chose not to buy a copy for this simple task. I use free software. I buy software. There are lots of things in this world that I choose not to buy, for any number of reasons, and that hardly means that I think they should be free or that there is anything wrong with them.

retiredtechnician wrote:

I elected to go through PayPal for many reasons, one of which you seem to be concerned about ... I don't have to handle any credit card numbers or any personal information, just the way I want it. As far as lowering your Privacy settings, this is required for PayPal to get your information. Also, as stated in the website, you don't have to pay with PayPal, you can pay with credit cards going through PayPal. It doesn't cost you a penny, I end up paying for that service on my end. I have NEVER had a problem with PayPal. They get a cut out of each transaction; but it's well worth it to me to have them handle all personal information.

RT

Again, your choice, and I do know that paypal is very popular. But they have done some sleazy things and their parent company has done plenty more. I choose not to support them in any way, including making a credit card purchase through them. I certainly don't like seeing their growing dominance on the Internet, and I am not hesitant to let others know that people who feel as I do exist. It's not a matter of paying anything, you couldn't give me money through paypay, I absolutely would never accept it under any conditions and for any amount. I believe I have that right.

Super!

Frovingslosh wrote:

Actually, I worked as a programmer for a number of years.

Now this great website will have another software contribution to 'remove duplicates'!

RT

--
"Internet: As Yogi Berra would say, "Don't believe 90% of what you read, and verify the other half."

Frovingslosh

I've purchased POI Verifier a few months ago (and use it just like MrKenFL) and didn't remember the different requirements.

I'll try to remember to use the term spreadsheet as I agree Open Office is an alternative.

Tim

For Mac

I wish POI Verifier was around for those of us on the Mac Platform.

Another comment about these duplicates-- having made a few small POI files, I totally respect the work some of you guys have done on these big, big files. I can't imagine the time it takes you. You are due a few duplicates I would think. Thanks for your hard work.

--
NUVI 660, Late 2012 iMac, Macbook 2.1 Fall 2008, iPhone6 , Nuvi 3790, iPad2

Verified this weekend

geochapman wrote:

I wish POI Verifier was around for those of us on the Mac Platform.

With the help of a POI Factory member, we verified POI Verifier will run on a "Mac OS X Version 10.5" by using "VMWare Fusion 1.1.3" with Vista (I assume XP as well). May not be what you want, but it's one option.

Thanks Bob for your help.

RT

--
"Internet: As Yogi Berra would say, "Don't believe 90% of what you read, and verify the other half."

I do not use Vista on my iMac

I realize more than one thing would run on my Mac if I turned it into a Vista PC, but I do not want to do that! OS X is a fine stable platform and it would be nice to have a few things like POI Verifier available in OS X format. If not, I guess I'll skip them but the authors of these platforms are missing the great number of Mac users and our number is growing.

--
NUVI 660, Late 2012 iMac, Macbook 2.1 Fall 2008, iPhone6 , Nuvi 3790, iPad2

Original Question response

You should always try to contact the author first instead of Miss POI or the site. (She certainly has enough to do!) It's the most direct approach. We're all a (generally) polite group of people with the desire to help each other. If the author doesn't have contact info available, then address the changes to the site.

As for identifying the offending duplicates, if the names are identical, send the coords (even if they're also identical). Regardless of the tool(s) the author uses, they'll be able to locate the questionable entries with the coords.

Another advantage of a .csv file

I would think if you opened a .csv file in Excel or something similar, you could do a sort on the Lat, Long columns and find duplicates kinda easy and delete them.

--
NUVI 660, Late 2012 iMac, Macbook 2.1 Fall 2008, iPhone6 , Nuvi 3790, iPad2

CSV and Open Office

You do not need Excel to work with the CSV files. You should be able to open them with OpenOffice Calc and accomplish the same functions as with Excel, just make sure the import is set to use commas to seperate the data.

sorting wasn't the only issue

geochapman wrote:

I would think if you opened a .csv file in Excel or something similar, you could do a sort on the Lat, Long columns and find duplicates kinda easy and delete them.

Actually, I can sort a text file just fine, and I can certainly sort a database. But by the time we seem them a POI file is either a CSV or a gpx (unless you have a Tom Tom or other strange hardware). I can see this would be harder with a gpx file. The big problem with duplicates like this is that the lat and lon are not always exact duplicates, and since you're looking at a swipe down or across the country when you look at adjacent POIs, duplicates are not always adjacent.

It's just not that easy to ID such duplicates visually if there name is all the same. Since I first asked I have discovered that if the points are nearby I can pull them up in "extras" sorted by proximity, and I can use "more information" to try to get some extra information to help ID them. That does work for the Wedny's sites and helps ID them, would not be of any help with a 3 filed, all common name files like "Bahama Breeze"

Excel formula for crude test of proximity

Here's my two cents: 1) I use Excel to sort by lat column ascending, then by long column ascending, then by column C ascending. 2) I use an empty column to the right, usually column E, to enter the following formula in cell E2.

=IF(AND(ABS(A2-A1)<0.001,ABS(B2-B1)<0.001),"CLOSE2UP",IF(AND(ABS(A3-A2)<.001,ABS(B3-B2)<.001),"CLOSE2DWN","FAR"))

I then apply this formula to all cells in column E by dragging the handle of the E2 box all the way down the list.

This formula is not rocket science. All it does is compare each line to the one above and below it and determine if the lats & longs differ by less than .001 of a degree. Roughly this locates coordinates that are within 800 feet of each other. That's sufficient accuracy for most applications when dealing with “point objects”. If you're working with “area objects”, such as parks or stadiums, then change the criterion to something like .01, which roughly translates to 1.5 miles. Obviously there are a range of possibilities between .01 and .001.

Each coordinate that is close to the line below it will have “CLOSE2DWN” in the E column; conversely, the line below is obviously close to the line above it and so that line will have CLOSE2UP in the E column.

The reason I enter the formula in cell E2 is that the result is undefined for cell E1 because there is no coordinate line above it.

.

j0ekan0 - or you could just download a free program which allows you to find duplicates by distance! http://turboccc.wikispaces.com/Extra_POI_Editor

thanks for the links

I'll use them to work on and improve the red light camera file... smile

I'm trying to combine some of the Chicago red light databases together with the poi-factory red light database to create one, all inclusive Chicago red light db. Weeding out duplicates just got a whole lot easier.

Cool

uncouth wrote:

I'll use them to work on and improve the red light camera file... smile

I'm trying to combine some of the Chicago red light databases together with the poi-factory red light database to create one, all inclusive Chicago red light db. Weeding out duplicates just got a whole lot easier.

I see that you have not had access to our red light camera file yet. When you do get access please let me know if I have missing locations.

Please only report locations that you have visually seen. I am pretty sure that I have all the locations in Chicago in the file.

I don't use data from other sites like mine so please do not submit a list if it comes from a competing source.

Miss POI

Duplicates Across Multiple files

Hi,

I have just started using POI loader and would like to know how people handle duplicates across several files.

I have download collections such as

St. Louis Dog Parks
St. Louis Metro Area attractions
St. Louis Parks File
Missouri State Parks with camping
Missouri Department of Conservation Areas
, etc.

When I call on my POI list, I select all of my "--Places" POI collections so some of my nearby places appear two or three times in my list.

I can live with this, but was wondering if anybody had a good technique to remove duplicates or near duplicates across files?

Thank you

--
- Missouri, Garmin 750 &, 255W

Extra Poi Editor at at

Extra Poi Editor at at http://turboccc.wikispaces.com/Extra_POI_Editor can find duplicates.Once you load the file go under edit and select find duplicates.I also use ASAP Utilities the free version that is really good for finding duplicates plus you can color code them.Once you find duplicates you need to report them to the author of the file using the contact tab.

--
Charlie. Nuvi 265 WT and Nuvi 2597 LMT. MapFactor Navigator - Offline Maps & GPS.

'cuz you're using multiple files

WalkThisWay wrote:

Hi,

I have just started using POI loader and would like to know how people handle duplicates across several files.

I have download collections such as

St. Louis Dog Parks
St. Louis Metro Area attractions
St. Louis Parks File
Missouri State Parks with camping
Missouri Department of Conservation Areas
, etc.

When I call on my POI list, I select all of my "--Places" POI collections so some of my nearby places appear two or three times in my list.

I can live with this, but was wondering if anybody had a good technique to remove duplicates or near duplicates across files?

Thank you

Because you're using multiple files as your source there are bound to be duplicates. The only way to get rid of them would be to do a sort on the coordinates after combining the files and then editing one entry so it tells you it came from multiple files.

--
ɐ‾nsǝɹ Just one click away from the end of the Internet

Thank you

Thank you!

--
- Missouri, Garmin 750 &, 255W

Duplicate remover

There is also this software that removes duplicates:

http://sourceforge.net/projects/poi-dup-finder/