Saturday, April 7, 2007

Google seeks better access to gov't info

Google seeks better access to government information

Oct 25, 2006

Officials from the leading Internet search engine are working to remove barriers that prevent their technology from reaching vast troves of information buried in government databases.

Internet users want government information because it has a reputation for being reliable and accurate, said J.L. Needham, a strategic partner development manager at Google. But while portions of agency Web sites are easily indexed by Google and other common search engines, the engines cannot search other areas, known as the deep Web.

For instance, Google cannot scan information in the database housed at the Environmental Protection Agency's Regulations.gov Web site, Needham said. The site allows users to view government regulations and post comments on proposed agency rules.

"If you were a business owner and found out you were potentially subject to a new regulation that you wanted to find out more information on, it may be difficult to find this information using a search engine like Google," Needham said. "The problem is that search engines are unable to crawl the full text of many government agencies' databases."

As much as 40 percent of the content on agency Web sites is invisible to Google's crawlers, Needham said. This means that for a majority of Internet users who do not know how to look beyond a search engine site, that information is effectively invisible.

Needham said he is meeting with a variety of agencies to discuss how the information housed in their databases can be made available in the search results from engines such as Google, Yahoo or MSN. One method would be to use Google Sitemaps, which enhances Google's search results, Needham said.

Implementation of Google Sitemaps by a federal institution that maintains one of the world's largest networks of sites, including many databases, doubled the number of Web links found by Google, Needham said. This allowed for millions of new documents to be included in search engine results, he said.

A Dec. 16, 2005, memorandum from Clay Johnson, deputy director for management at the Office of Management and Budget, required all agencies by Sept. 1, 2006, to set up their public information so that it is searchable. It stated that "increasingly sophisticated Internet search functions" can "greatly assist agencies in this area."

Agencies also were required to provide all public data in an open format that allows the public to aggregate "or otherwise manipulate and analyze the data to meet their needs" by Dec. 31, 2005, according to a separate OMB memorandum signed by Johnson on Dec. 17, 2004.

Mark Luttner, director of EPA's Office of Information Collection in the Office of Environmental Information, said the agency's e-rulemaking program management office is working with OMB to respond to a recent request from a search engine company that wants to index the Regulations.gov data.

In addition to the technical challenges presented by the company's request, EPA has to consider whether a commercial company could assert proprietary ownership on federal data and whether providing government data to one company would provide an unequal playing field for other companies, Luttner said.

Needham said Google, for one, does not want to assert ownership over any information obtained from agencies, and agency efforts to improve the ability to search their Web sites would likely be equally beneficial to its competitors.

Commonly used search engines like Google are able to index other agency Web sites used to disseminate information, such as the Small Business Administration's Business Gateway e-government initiative.

Nancy Sternberg, the program manager for Business Gateway, said the initiative's Web site, Business.gov, has been optimized for all major search engines. But Business.gov does not contain a separate database, Sternberg said, which would make indexing much more challenging.

Search engines cannot index the Grants.gov database housed at the Health and Human Services Department, according to John Etcheverry, director of grants systems modernization at HHS. But in 2007, Grants.gov will implement a Google search appliance that will let Google scan specified database tables with grant synopsis information, he said. Allowing search engines to crawl the entire Grants.gov database would create security vulnerabilities since it contains sensitive applicant information, he noted.

Google's forays into the government include a U.S. Government Search Web page, which is intended to provide a single location for searching across agency information and for keeping up-to-date on government news. Google maintains the site is not intended to compete with the government search site hosted by the General Services Administration, called FirstGov.gov. Rather, it is intended to complement it, company officials say.

John Murphy, director of FirstGov.gov technologies, said the FirstGov.gov pages are optimized for all search engines, but the MSN-run search tool is specifically directed to searching government Web pages, including those hosted by state and local governments.



©2007 by National Journal Group Inc. All rights reserved.

Wednesday, April 4, 2007

Following is a clip describing an interesting initiative of the Citizens League 0 Facts Unfiltered offers an ideal opportunity to contribute public information to the discussion. Depository librarians and others familiar with goverment information sources may be willing to post info about sources, access, finding tools to this public discussion of issues -- issues discussed more knowledgeably by those who have identified and checked public information sources.

http://map150.org/facts-unfiltered/

Main Page

From MAP150 Wiki

Jump to: navigation, search

MediaWiki has been successfully installed.

Consult the User's Guide for information on using the wiki software.

Facts Unfiltered

One of the facets of MAP 150 this fall is to gather facts around key policy issues. The questions we’re posing focus on the capacity of different aspects of Minnesota in the future. The facts that we gather will be used to supplement our interviews with Minnesotans to better understand what Minnesota’s policy environment. These are the questions that we’re looking to put facts to.

Family’s capacity: What are the future prospects for Minnesota families in making ends meet?

State’s capacity: What does the fiscal health of the state budget hold for Minnesotans: can they continue to expect the same level of services at the same tax rates in the future?

Economy’s capacity in terms of human capital: Is the state’s educational system producing the labor force we’ll need to maintain Minnesota’s strong economic growth?

Capacity of the natural and built environments: Given Minnesota’s projected population growth, is Minnesota creating and preserving built and natural environments that will sustain Minnesota’s prized quality of life?

Capacity for innovation in the public interest: Is Minnesota retaining its capacity for civic and political innovation, especially in ways that reflect the growing diversification of the state population?



Government Information has wide impacts

Anyone paying attention to the news recently ahs noticed that the topic of home mortgage foreclosures is hot. Lots of ink (and electricity for pixels) has been used to document the tide of mortgage foreclosures and their impact on families, cities and the international financial markets.

If you (as a memebr of the public) want to udnerstand where all this is coming from, you need government information. Data on foreclosure sales comes from local sheriff's departments (Hennepin & Ramsey Counties locally). Data on home mortgages, borrowers and lenders comes from Home Mortgaged Disclosure data (HMDA) compiled by the federal government.

Analyses of these data show who is being afffected, how it is impacting our communities and what's likely to happen in the next few years.

If you want to do something about this problem (either as a softhearted liberal or a tough minded investor), you should thank the public agencies who collect and report this information.

Public access - it's not academic, it's the real world, the real economy.

Imagine if this information was NOT public!

CRS reports - action needed

From Open the Government --

Policy and News Updates for March 20, 2007

[new] Accessing Congressional research
American taxpayers spend nearly $100 million a year to fund the Congressional Research Service (CRS), an arm of the Library of Congress whose 700 researchers provide reports to members of Congress on a variety of topics relevant to current political events. However, these reports are not open to the public. The best way for the public to obtain a CRS report is through their member of Congress, but the process is slow and it requires that the requester know the report exists.

On March 28, coalition partners of OpenTheGovernment.org, along with the Center for Democracy and Technology and others, sent a letter to House Speaker Nancy Pelosi [D-CA], the Senate Rules Committee, and the Committee on House Administration asking that CRS reports be made available to the public. According the letter, "Public demand for these reports has never been higher. In a little more than a year, members of the public have downloaded more than 3.5 million CRS reports from OpenCRS.com, a Web site that provides a searchable database of CRS reports that have been obtained by various archivists and members of the public."

Making CRS reports available to the public makes sense. According to the letter, "CRS already maintains a fully searchable, password-protected Web site for members of Congress... Increasing capacity and providing public access to that site would constitute a trivial expense for the Library of Congress or for the House in light of their current levels of traffic." As Paul M. Weyrich, Chairman and CEO of the Free Congress Foundation, notes in a commentary piece, "The proliferation of free information online has only strengthened the argument that CRS also should offer free information."

Focus has turned to CRS recently, due to two memos CRS Director Daniel P. Mulhollan issued noting changes in CRS policies. On March 20, Mulhollan issued a memo to all CRS staff saying, "I have concluded that prior approval should now be required at the division or office level before products are distributed to members of the public. This policy is effective immediately." In the past, it was possible for members of the press, other researchers, and other government officials to request specific reports from the congressional support agency. However, as Secrecy News reports, this memo changes that policy. A staffer told Secrecy News, "We're all sort of shaking. I can't do my work."

In addition, some members of Congress have criticized CRS for its recent decision to stop its reports on earmarks. In a Wall Street Journal Editorial titled "Earmark Cover-Up", John Fund accuses CRS of "helping its masters hide wasteful spending." Despite requests from Sen. Coburn [R-OK] and Sen. Jim DeMint [R-SC], a private Feb. 22 directive from Director Mulhollan states that, "CRS will no longer identify earmarks for individual programs, activities, entities, or individuals," ending its 12-year practice of reporting on earmark data. Mulhollan responded to the editorial in a memo to CRS staff and a letter to the editor, available through Secrecy News.

TAKE ACTION: Help add to the Open CRS collection! Call your member of Congress and request a PDF copy of the following CRS Report. Once you receive it, submit it to Open CRS.
Report Order Code: RL31686
Title: Demilitarization of Significant Military Equipment

Tuesday, April 3, 2007

Lots of news on government information

3/28 PPD, p. 2C. In the National subsection, a story headlined "Research firm sues to see auto safety data", is about a group that studies tire failures suing the U.S. Department of Transportation.

3/29 Strib, p. E4. in "News of the Weird", the first item describes a website being operated in Finland that allows voters to match their physical appearance with the appearance of candidates so they can vote for people who look like them.

3/30 Strib. p. A10. Story, headlined "Soldiers' VA cure hindered by lapses in use of digital medical data".

3/30 Strib. p. A12. Story, headlined "Interior official altered reports, inspector says", is about another instance of government reports being altered for political reasons. The story also describes release of confidential information to selected businesses.

3/30 PPD, p.1C. Story, headlined "Ridder's clear break scuffed" includes allegations that the new publisher of the Strib took confidential data from the PPD.

3/30 PPD, p. 1A. Story, headlined "St. Paul candidates join You Tube nation".

3/30 PPD, p. 7a. Story, headlined "Bush appointee altered species reports to benefit landowners", is the same story as described above but with a more direct headline.

3/31 PPD, p. 2C. Story, headlined "Technology 'xxx' hits red light again", is about defeat of a proposal to give pornography its own address on the web.

3/31 PPD, p. 1C. Story, headlined "Ridder says exit made in good faith", continues the saga of what Rider took from St. Paul to Minneapolis on his laptop.

4/1 PPD, p. 3A. Story, headlined "Bush fills attorney post with insider" is based on analysis of the resumes of new U.S. attorneys.

4/1 Strib. p. A9. Story, headlined "Many prosecutors had inside track", is the same as the Pioneer Press story mentioned above.

4/1 Strib. p. D1. Story, headlined "In a hurry for ultrafast Internet", is about the City of Eagan looking for high speed internet for the entire city including the possibility of laying fiber citywide.

4/1 Strib. p. B1. Nick Coleman's column, headlined "If Heffelfinger hadn't quit would he have been purged", is Coleman's take, using public e-mails, on the local effect of the U.S. attorney fiasco.

4/2 Strib, p. A11. Story, headlined "From cabarets in Cyprus to drivers in Qatar, dangers abound for Americans abroad, the State Dept. warns", is about a state department cataloging dangers of foreign travel, which is put up mostly for businesses, according to a spokesperson, but is available to anyone.

4/2 PPD, p. 2B. In the "Bulletin Board" feature is an item, headlined "Will the library let me borrow this book? It's 'TOO SOON TO TELL'", is about a technology twist in libraries.

Monday, March 26, 2007

Newspaper Stories Overview

3/22 PPD, p. 9B. Story, headlined "Railroad accused of destroying documents", is about the Canadian Pacific RR allegedly destroying documents and e-mails to avoid their being used in a law suit involving chemical spills in Minot, ND.

3/22 PPD, p. 2A. Story, headlined "Inefficiency curbs world food aid", is another government report documenting how poorly a government program to feed the hungry has been run for the last few years including a 43% drop in food delivered.

3/22 PPD, p. 5a. Story, headlined "Anti-Clinton online ad creator unmasked", is another demonstration of the power of the net and poor judgement of some of the young ones who are adept at using the net.

3/22 Strib. p. A1. Story, headlined "Huckleberry Finn wins first round in St. Louis Park", documents the latest attempt to censor Twain's classic.

3/23 PPD. p. 2C. In the "Technology" subsection there is a story, headlined "Net neutrality studied", about the beginning of an FCC study which will look at how to keep the net available to everyone.

3/23 Strib. p. B5. Story, headlined "Traffic camera bill hits red light", is about the photo cop bill dying (perhaps) in a House of Reps. committee.

3/23 Strib. p. B5. Story, headlined "Funding sought for program to protect battered women", is about another instance of a good program, protecting addresses of battered women, which has not been started because of lack of funding.

3/23 Strib. p. A1. Story, headlined "Doctors ties to drug firms raise concerns", is about reports that drug companies have been required to file with the state Board of Pharmacy describing payments to doctors and for what. It was only recently that anyone looked at the reports. The executive director of the Board is quoted in the article as saying that now that they know there is interest in the reports they may be posted on their web site. http://www.startribune.com/1244/story/1074012.html

2/24 Strib. p. A7. Story, headlined "Memo shows Gonzalez approved of Attorneys' firings", continues to show the importance of actual data in sorting out what our government is doing.
http://www.startribune.com/587/story/1076414.html

2/24 Strib. p. A6. Story, headlined "France's X-files - first country to put UFO sightings on a web site" is about another interesting use of the net.

2/24 PPD. p. 1A. Story, headlined "Gonzales, aides met to discuss firing attorneys", is similar to the Strib story noted above.

2/25 PPD. p. 2C. Story, in Technology subsection and headlined "Search suit dismissed" is about a lawsuit brought by a web site alleging Google intentionally designed its index to make the site difficult to find.

2/25 PPD. p. 1E. The Watchdog column has a story entitled "Travel Insurance" which includes commentary about how credit card companies monitor the uses of cards in real time.

Friday, March 23, 2007

3/19 PPD, p. 6B. This is David Broder's column, which is headlined "Accountability matters, but voters wanted a lot more than investigations", in which, among other things, Broder reports on changes to the federal FOIA [Freedom of Information Act.]

3/19 Strib, p. A3. Story, headlined "Smithsonian records show off curator's champagne lifestyle", demonstrates why public records that show how taxpayers money is spent must be public.

3/18 PPD, p. 14A. Story, headlined "Young woman's DNA points to an inevitably grim fate", is about the affect of DNA testing on actual individuals.