Spiders don’t use Screen Readers (SEO vs Web Accessibility)

No Comments »

Written for the RI:Technology Blog

How often have you been asked “so if we don’t use Flash, this will be searchable/accessible, right”? As though there is some new compound word describing a site whose content is easily available to all non-human user agents.

Ah, we should be so lucky! While some coding practices aid in both SEO and web accessibility, there are some fundamental differences between the practices.

One of the most basic differences is the intent. When looking at a web project, it is completely acceptable to prioritize as to which content you want to be indexed for a search engine—for example, a company may not care if their short-lived events data gets indexed. SEO is about attracting traffic to your site. In contrast, web accessibility is about ensuring an individual can use your site once he’s on it.

Up until, oh, two weeks ago, Flash was commonly considered to be unsearchable. A way to ensure the content of the site could still be indexed (and therefore show up in google) was to write the content to the HTML page, and then if the visitor had Flash available, overlay the static text with a richer experience. This worked fine to allow a search engine spider to index the content, however, it didn’t always provide a user of assistive technology a good experience.

There is a common belief that users of assistive technologies don’t or can’t access Flash, so they would get the stripped down, text-only version. This isn’t always the case, they may get the Flash-enabled version, like other human visitors. Well, except their actual experience is significantly different..

For anyone who has never seen a screen reader in action, I highly recommend you check out this Introduction to Screen Readers movie.

Flash has had accessibility properties available to developers since Flash MX, and Adobe Flex provides built-in “accessible components.” However, unlike the recent announcement about .swf indexing not requiring any additional effort on the behalf of developers, creating an accessible .swf experience does require some work. As well, accessibility for .swfs depends on MSAA (Microsoft Active Accessibility), so it is platform dependent. Even a diligent developer will find his hard work is all for naught if his visitor is on a MAC.

As you can see, “searchable” and “accessible” cannot be used interchangeably. While the tactics for each don’t necessarily conflict, there are different goals and different considerations to take into account.

Like it? Share it! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • StumbleUpon
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Print this article!

Flash Indexing

No Comments »

As a previously scheduled post on accessibility and indexability went live, a few folks pointed me to some news on searchable/indexable swfs.

A few of the articles I checked out:

  1. Google Now Crawling and Indexing Flash Content
  2. Improved Flash Indexing (Official Google Webmaster Central Blog)
  3. SWF searchability FAQ

I will admit I referred to the articles with a critical eye; google has been flirting with retrieving some amount of content from .swfs for quite awhile. Yet for the first time, I got a sense there has been real progress.

The premise is that Google and Yahoo! spiders will access the content via an enhanced Flash player. This enhanced player will give the search engine spiders the ability to navigate within the Flash experience, and access and index associated resources.

This is an exciting prospect, as until now many site designers were resigned to duplicating the content that was available from within Flash on the HTML page wrapper that housed the Flash. This followed the web development strategy of ‘progressive enhancement‘, where a non-flash-enabled site visitor (like the Googlebot) would be able to access at least the core content, and the more capabilities the visitor possessed (CSS, rich media), the more enhanced their experience. In addition to potentially increasing maintenance costs (to ensure the two versions were in sync), implementing this method is sometimes not feasible at all, depending on the complexity of the application.

I was eager to see how what I knew about Flash accessibility best practices came into play, and eagerly read through the documentation. As I did so, however, I found I had more questions than answers. In the Google Webmaster Central Blog, there is an intriguing statement:

we do not generate any anchor text for Flash buttons which target some URL, but which have no associated text.

When I first read this, I believed it meant that some links may not be followed. This makes sense from the standpoint that a button with no associated text would essentially be a hidden link, and following it may inaccurately represent the content of the site. However, the statement actually focuses on the generation of anchor text. I am not clear where this generation would take place; perhaps in a virtual buffer of all the Flash content? How does the content of the link (assuming that it DOES get followed) get associated with the overall Flash content (since there is no anchor text).

Another consideration is the use of tabindices. When coding Flash for accessibility, tabindices may be used to specify reading order. Is this something that search engine spiders will be aware of? Equally, there is a recommendation in the Google docs to “consider replacing the text within an image.. [to make] ..less informative content.. invisible to [Google]“.
This statement made me question of the sophistication of this enhanced player. For years, Google has managed to determine that items such as copyright statements are not significant content items. So why now are they unaware of this fact now that the content is coming from a .swf? The recommendation to move content from an accessible to an inaccessible form seems terribly shortsighted and irresponsible.
We are now quite sophisticated in using semantic markup for html pages to offer search engine spiders some information about the relative importance of elements.I can only assume that all text being pulled from a Flash element is given equal weighting. If this is the case, as is noted in the Adobe Developer Center documentation we will certainly need to see “best practices emerge over time for creating SWF content that is more optimized for search engine rankings”.

Another major challenge in opening applications up to search is being able to direct the searcher to the relevant section within the experience. This is also a concern with accessible PDFs. Much of the documentation recommended the use of deep-linking. However, it’s not clear to me how the spider is made aware of these deep-links. I will admit that my own exposure to deep-linking with a flash experience is limited: we did this for the People’s Choice Awards site, where querystring parameters were fed into the .swf using flashVars. While the Adobe Developer Center documentation mentions this practice (”you can create multiple HTML files that provide different variables to the SWF and start your application at the correct subsection”), I hadn’t been aware that google supported variables in their search result URLs…

There was also some mention made that external files linked to from within the .swf will be indexed, but separately. The implication is that the contents of a data file will show up in search results, separate from its presentational format (and overall context). While I assume this will be resolved in future releases, a diligent developer will likely want to ensure their “include” files are not accessed on their own. I believe my colleagues did something similar when we launched the Wal-Mart Halloween Flash/HTML Hybrid site last year. They did some great work with deep-linking and history management, and handled orphan content loading (I refer anyone interested in the specifics to Toby Miller). My concern is that based on how this functionality was announced (that developers did not need to do anything for their swfs to be indexed), there will be little motivation to ensure content is always delivered in the proper context.

Obviously, I am very interested to see if this development will enhance the experience of users of assistive technologies. Sadly, I’m not sure it will, as the major breakthrough has been made with the enhanced player. Unless Adobe also plans to work with makers of assistive technologies, I don’t know that any of these benefits will be realized. If anything, site designers may stop some of their earlier practices (textual alternatives).

I’m very interested to know if any of the accessibility properties and best practices have made it into this enhanced search — how great would it be if the use of these properties increased the weighting of content!

Like it? Share it! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • StumbleUpon
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Print this article!

what’s the deal with… findability, searchability, indexability and accessibility?

2 Comments »

As a front-end web developer, I often hear the terms “findable”, “searchable”, “indexable” and “accessible” thrown around interchangeably. For many, they mean that the content can be accessed by a non-human, be it a screen reader or a search engine spider. On some level this is true, but there are several significant differences that are must not be overlooked.

For the sake of this discussion:

  • Findable: how easily a site can be found when using a search engine (rankings). Yes, I realize that this term also refers to how easily content can be found once the user is on the site, but I’m ignoring that aspect of it for now…
  • Searchable: how easily specific content within a site can be accessed when using a search engine (deep-linking)
  • Indexable: how easily the content of a site may be retrieved and used in search engine results
  • Accessible using AT: how easily someone using assistive technologies can use your site

(ShoeMoney.com has compiled a list of definitions for SEO from some industry experts, as well)

A site created completely in Flash or Flex may be findable thanks to the use of meta-data, but it is not indexable. With some diligent coding, information may be searchable, but this is no guarantee that it will be accessible.

(Not content with these descriptions? Have more to add? Please let me know what you think in the comments!)

As I’ve mentioned, my background is in accessibility: prior to coming to Resource, I worked on large subscription-based web applications. SEO was not a consideration at all. However, accessibility was. When I first came to Resource, I was eager to see how the two complemented and contrasted each other.

Overall, I see some overlap between the areas. However, their focus is different.

SEO is based on a page mentality - this is apparent in the search results that come up. Many common SEO techniques are applied at the page level, via adding meta tags or optimizing title tags. This is how a site that requires login, or is built using a technology like Flash or Flex, can appear in search results. A search engine can access meta information about the page, and use that to rank it. Findability relates to the notion of the discovery of the page itself.

A secondary notion is that of searchability. A web application may be found on google, but can the specific content that is being sought be retrieved? Searchability refers to the idea that site visitor can easily navigate to the specific information he’s searching for within the site, once the site itself has been discovered.

Both searchability and indexability deal with how elements of the page can be accessed, but arguably in different directions. Deeplinking into a flash movie may facilitate searchability, helping a site visitor dig into the site at a specific point. In contrast, indexability refers to the ability of a search engine spider to do a broad pull of content from the site.

Where SEO and Accessibility really start to diverge is when we move beyond the retrieval of content itself. A search engine spider is only interested in the data, so that the appropriate search result may be returned to an information seeker. In contrast, accessibility refers to the ability of a site visitor to navigate within an experience. The implications are significant: each interaction must be coded in a way such that a screen reader user can activate the change, and be notified of any changes that occur.

Another important distinction is the extent to which the site content is made available. A site may work to optimize or only make indexable certain aspects of the site. In contrast, accessibility refers to the ability of all content to be available and able to be engaged with.

Like it? Share it! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • StumbleUpon
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Print this article!

Strategies for Blogging and Social Network Marketing: A Case Study (PodCamp Ohio)

3 Comments »

The final session of the day that I attended was on strategies for blogging and social networking marketing. Some of the content was similar to the viral campaign session I’d attended earlier, I liked the use of the one specific case study to frame their work.

Right away speaker Bill Balderaz of Webbed Marketing laid out the three things you need for success:

  1. a compelling hook
  2. the right channels
  3. identify client goals

In the case study he shared with us (Shizuka New York), the compelling hook was “bird poop facials”.

A good litmus test to whether or not your idea is compelling- would you talk about it at dinner? a new CEO hired from a competitor? Nah. But bird poop facials? Sure!

Bill mentioned four specific channels to consider:

  1. SEO Press release
  2. Blogger outreach
  3. video
  4. Social networks

I wasn’t really familiar with the terms “SEO Press release”, but it was quite interesting. Bill mentioned that they will search for specific phrases on search engines to ensure the uniqueness of their phrasing. That way they can be sure that when monitoring buzz or search queries, all the results are directly tied to their efforts. He did acknowledge that the most newsworthy your story, the more likely a journalist will snap up the idea and write about it in their own words. In this case, your carefully chosen phrasing is lost.

Through the presentation, Bill was very diligent at showing us the “before and after”, highlighting the importance of analytics and establishing your measures for success. We looked at google news, which had 2 links to the company in May, and roughly 50 post-campaign.

Blogger outreach is refers yet again to really figuring out the type of influentials to tap.

As for social networking, Bill said that they did not try to build for or leverage all the social networks. He said they actually received the most traffic from StumbleUpon, which was a surprise to me. I didn’t realize it was such a bg player. He also acknowledged that like it or not, you can’t ignore mySpace.

Supposedly CNN ran this story on the front page one day, but still 46% of the traffic came from social networks. While CNN gave a one-day spike in traffic, the networks were overall more significant.

Someone asked about the time this campaign took, and he said the video shoot was the biggest task, coming in at about 10 hours. The rest of the campaign and marketing was about 40 hours. In the end, the company saw traffic increases from all sources, not just referring sites. People weren’t just clicking on links they had presented to them; bird poop facials at Shizuka had reached a point where people were talking or thinking about them, and motivated to seek them out.

He talked some more about some compelling ideas and hooks, including the work they did for Hatteras networks (the cash cow), or the scantily clad etymologist at HotForWords.com

While I don’t know that this session really offered me many “strategies” for blogging and social media marketing, I did find the session interesting. I appreciated the focus on the results achieved, and how they were managed. I still feel in many ways that analytics is still in its infancy, and I appreciated the approach that was taken to demonstrate the campaign’s success.

Like it? Share it! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • StumbleUpon
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Print this article!

Artificial Intelligence: a solution for Artificial Content? (Fighting Hidden Keyword Spam)

No Comments »

I was recently doing some research for a blog post I’m writing on screen readers vs SEO for the RI:Technology Blog. A 2005 blog post by Matt Cutts from Google entitled “SEO Mistakes: Unwise Comments” solicited many concerns about the use of hidden content being considered keyword spam.

There are plenty of legitimate reasons for hiding text from a sighted user on page load, and in many cases, this is simply a stylistic effect and the content will be surfaced as a result of user interaction. It is not about the use of the technique, but rather the misuse. The official Google Webmaster Guidelines do have a page dedicated to hidden text and links, but it also lists as a basic principle:

Make pages primarily for users, not for search engines. Don’t deceive your users or present different content to search engines than you display to users, which is commonly referred to as “cloaking.”

So how do we determine if a technique is being used appropriately? There has always been the old standby technique to disable CSS. Does the page still make sense, or is it littered with content not meant for human consumption? This would solve our concerns about “instructional” help for users of assistive technologies, and the suppression of content until the user opts to display it. (Indeed, it is a progressive enhancement best practice to have the content on the page and then hide it using javascript anyway, so that it is available even if JS is turned off.) This works fine for intelligent human users, but we all know that in GoogleLand, human reviews are NOT a desired goal.

So what about artificial intelligence as an option? Ever so slowly (yet steadily), we are moving forward in the area of natural language processing. What if AI and NLP were used to assess the semantics of page contents? When I access a website, I don’t expect to see a series of keywords. A human accessing a page is looking for content, not keywords describing the content. Some intelligence could be used to identify the overall syntax of the content, to ensure it’s legitimate “Content”.

Naturally, specific page elements would have to be accounted for. A list of navigation links may look suspiciously like keywords. This is where semantic markup comes into play, in particular some of the new tags proposed for HTML5 (nav or section, for example) or roles outlined in WAI-ARIA. A series of (internal) links would be expected in the nav element, but a collection of random words not appearing in proper syntactic form elsewhere in the document would be considered suspect.

Obviously, whenever there are rules, there will be people setting out to break them. But if we are cognizant of how these black hat techniques differ from legitimate best practices, surely we can filter them out as such. It’s a shame to penalize those who are honestly working to enhance the user experience, not cater to search engines.

Or, as Eric Meyer stated at the Spring Break conference last week, the best google juice is having good content so that everyone want to link to you. Do it right, and the hits will come organically..

Like it? Share it! These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Facebook
  • StumbleUpon
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Print this article!