Saturday, February 25, 2012

Geospatial One-Stop >> Geo.Data.gov >> GeoPlatform.gov

Here are details about GeoPlatform evolution, provided by Esri's Geoportal Server lead, Marten Hogeweg, and published here with Marten's permission:


Geoplatform.gov is actually built on ArcGIS Online which is running on-premises in the GSA hosting environment. Geoplatform.gov currently has a small set of publishers who 'curate' the content that is visible, with a focus on web services.

When Geospatial One-Stop retired, it was integrated into the Data.gov website as http://geo.data.govGeo.data.gov IS built on the open source Geoportal Server (http://esriurl.com/geoportalserver).

There were close to 650,000 items in Geospatial One-Stop, many of which were from state/local government or from academia and do not meet the criteria (http://www.data.gov/datapolicy) to be made discoverable through Geo.data.gov.

To provide access to that full set (by now grown to about 950,000 geospatial resources), the search from the Geoplatform.gov site was included.

There are really two programs (Geoplatform and Data.gov) that are looking at how to best implement their objectives/mandates: serve nationally significant geospatial data assets from and to the geospatial community at large) vs increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government (http://www.data.gov/about).

What you see now is (hopefully) a transition from the Geospatial One-Stop period into the new open government data period where these two objectives/mandates unite... 

Friday, February 24, 2012

Insider: How to Search the New Geoplatform.gov

Here is some insider info on how to search the new U.S. government Geospatial Platform developed by US EPA and GSA.

The GeoPlatform is built on Esri's open-source Geoportal Server.  Even though it looks like ArcGIS.com ...



... it's running Geoportal Server under the hood.  That means it should search geospatial metadata.  Since it's federal, it also federates -- harvesting and indexing metadata from many geodata servers.  Then it runs a federated query across all servers to find data you want.  

So why doesn't anything show up when you search for 'chesapeake' ?



You have to click the "Related Searched" link farther down the page.  Now you get the dozens of results in a format that looks familiar to anyone works with metadata and geoportals:


The search engine is reaching back into the geodata.gov and data.gov metadata domains.

Why is the search function like this?  GeoPlatform gurus say it's only temporary - a work in progress.  In my view, it's intriguing work to merge the best of FGDC/ISO-compliant metadata + geoportal search capability, + web mapping provided by the Esri web mapping APIs.  More on that in my next post on geoportal news coming out of the Esri FedCon this week.




Sunday, February 12, 2012

No, Virginia. Google Maps Can't Really Predict Meth Labs Before They Open



At least, not the way you wish.


The recent FastCompany article on geospatial predictive analytics leads you to believe that Google Maps (with a nod to "GIS") can predict and find
  • the exact locations of clandestine meth labs,
  • city blocks that hide covert drug dealers,
  • the location of the next car break-in,
  • the suburban neighborhood where a street gang will next appear,
  • the exact point on the US-Mexico border where the next drug shipment will cross.
Yes, the meth lab may appear within this quarter mile area if you're in the city.  Or within these 18-50 square miles if you're in rural Colorado.

Size matters in geographic units

In cases like these, it's geographic scale and precision that separate fact from fiction.  A careful read of the GIS article, the police department planning documents, and the software sales literature will show that predictive analytics seldom approaches the location precision stated or implied here.

Look carefully at how they define "where".  Is it this street corner?  Or this census tract?  Or this county?   

All of the cited studies and crime prevention software rely on US Census demographics to characterize the human landscape. U.S. Census demographic data is compiled down to the block group level.  There are 211,267 block groups in the USA.  That puts the average area for a block group at 18 sq. miles.  That may be the size of the geographic area to which the predictive analysis says you should allocate your resources.  Or, if your study area is urban and densely populated, the average area of the census block group will be smaller -- for example, 0.25 sq. miles in Philadelphia.  The future meth lab may appear in an area that size.  Not exactly pinpoint accuracy.

Other inputs to the predictive analytic model may be even less precise geographically.  In the analysis results, the geographic areas that are statistically significant may be quite large, either because there are too few input points within the area of study, or because the locations of those inputs are themselves not precise.

Size matters in the probability quotient

Can software predict when a crime or other event will happen?  That depends on what we mean by "when".  The answer is always a probability quotient.  In a recent Information Week interview,  SPSS technical director Bill Haffey said it right:   "It's not a binary yes or no; it's more of an assessment of risk--how probable something is."

TL;DR ?

Give GIS -- not Google Maps -- the credit for making location an important input to predictive analytics.  But not too much credit.  It's easy to sell Google Maps as a "secret weapon", or GIS-based predictive analytics as a silver bullet, when reality is down in the details.  Here are details from the case studies listed above:


Meth labs where they pop up next


"Map data analyzed over time successfully demonstrated the spread of meth labs throughout a metropolitan area--and even predicted where they would pop up next."


Facts -  from the source document:


"Spatial analysis ... shows that meth labs are clustered roughly in and around the downtown area ... in neighborhoods with a young and predominantly white population, small household size, and low education levels."


That's where you'll find the next meth lab.  Not on this street, but in any of the census tracts in your jurisdiction with that demographic profile.  Place your law enforcement assets there.


How will these demographics change with the next census?  Move your assets to those areas.


City block that hosts the discreet drug dealer


"Police departments ... are using similar methods ... to find blocks likely to host discreet drug dealers."


Facts - from the source document:


"The goal ...  [is] geospatial predictive analysis and threshold analysis to inform the focus of police and community resources."


Read New Haven's plan.  The goal to allocate resources better -- a more modest goal than pointing out the city blocks where hidden drug dealers operate, or where the next car break-in will occur.


The suburb where street gangs will recruit next


One firm ... recently boasted of their ability to use predictive analysis to find suburbs that street gangs are likely to recruit in.


Facts - from the source document:


"Gang Recruitment Site Selection [determines the] level of future likelihood that the threat entity would gain a foothold in the new location ... [The] suburban location is shown to be at risk because it is suitable for future gang recruitment, creating an opportunity for actions to observe or disrupt recruitment and maintain safety."


Although not stated explicitly, the software described here certainly looks for a statistically significant relationship between gang activity and census demographics.  Once again, at a location precision comparable to the census block (0.25 to 20+ sq miles.)


"Exact point where drugs will cross the border"


In the near future, the [DEA] is expected to start using newer, more sophisticated models that will enable DEA agents to predict the exact points at the border in which drug deliveries enter the United States.


Facts - from the source document:


"[DEA] EPIC’s newly established Predictive Analysis Unit produces reports summarizing drug seizures along routes identified as drug smuggling corridors and provides these reports to interdiction agencies ...  this unit [will be] conducting 'post seizure analysis' to identify the point at which drugs entered the United States and to determine the reasons for the failure to interdict the drugs at the border."


There is no exact-point targeting mentioned in the DEA source document.  And the analysis is not even predictive.







Mapping the Birthplace of Frederick Douglass




For Black History Month, I worked with Choptank River Heritage to map the "lost" birthplace of Frederick Douglass.  The Douglass birthplace site on Maryland's Eastern Shore is unmarked.  The nearest memorials are 7 and 14 miles from the actual site.




Check out the online map presentation, The Search for Frederick Douglass's Birthplace.


Research into the Douglass birthplace site was first published in 1985 by a local historian, Preston Dickson, in Young Frederick Douglass:  The Maryland Years.  Before that time, the site was unlocated.  The information first became widely available on the Web through my daughter Amanda's 1996 school project.  

Friday, February 10, 2012

Explain it to Your Boss: The Map is There but Your Data is Still Here

Some business managers don’t want “our” data sitting on “their” map server.   So they’re reluctant to share sensitive business data in maps that are authored and published at public map portals.


They may think that when we publish our map data in the “cloud”, we give up ownership and the security oversight of our data.   That’s not necessarily so.  You can explain it to your boss like this, using an example from the ArcGIS Online mapping portal --  

You can sit behind your firewall and publish your sensitive data in a web map.  Here's what it looks like:



Your boss might not understand that although the map is at ArcGIS Online, your map data does not have to be.  Web page and map components can come from many public web servers, while the map data itself remains secure.  The map data comes from your secure GIS server:



It gets delivered to the web map only if the browser user has been granted access to our GIS server; that is, if the user’s computer is also sitting behind the same firewall with the GIS server, or if the user has login access to your GIS server from outside.   

If no access is granted to your GIS server, the points will not appear on the web map.   Only the background (basemap) will appear.

The business data (map points) are all we need to own and control.  At the same time, we can build on free, publicly-available code and data platforms for map programming and publishing.   For example, the background map could come from public servers at Esri, Google, or OSM:



We don't have to build the map widgets ourselves.  In this example, they're sent to the browser from Esri’s Javascript API server.  But they could have come from Google Maps or another server:



The ”Dojo-Javascript”  layout of the web page – header, banner panel, navigation panel – could be from code that sits on the Google or Yandex CDN servers:



Code from other Web servers could also be embedded in the map or the web page.  For example, we might overlay our data layer with weather data coming from NOAA or tweets from Twitter.  So, in an example like this, we may rely on 4 or more different public web servers, all supporting the publication and sharing of our secure geospatial data.

So, your boss and trusted colleagues can sit behind our firewall and benefit from web mapping in the public cloud.  Their web browser reaches out to the public Internet to bring in elements of the map and page that surround your map data.  But the map data itself stays inside a secure channel from our GIS server to the web browser and map.

I know you understand this.  But your boss might not.


::: ------- :::

Not so simple?

My scenario above won't address the concerns of all business managers.  For example, I worked with a county health department whose management was reluctant to serve up water quality monitoring maps to the public unless they appeared on the county’s web site, rather than at ArcGIS.com.  It was an ownership and branding issue. 

Esri has tried to address this concern with its Public MapsGallery template.  The Gallery is a clever approach that lets an organization author, publish, and host maps at ArcGIS Online, but deliver them in web pages that come from the organization’s own web server, with the organization’s own style and labeling.  No “ArcGIS Online” labels anywhere.  Too clever?  Your boss will have to decide.

Valid security issues -- not just branding -- are raised if you actually upload some or all of your data to the public map portal.  Or if you go to the public portal to author and publish information related to the map, such as authorship, map description, and geospatial metadata.  In that case, you need to understand the hosting services security policy and implementation.  You can read about Esri's for ArcGIS Online, here .