As a previously scheduled post on accessibility and indexability went live, a few folks pointed me to some news on searchable/indexable swfs.
A few of the articles I checked out:
- Google Now Crawling and Indexing Flash Content
- Improved Flash Indexing (Official Google Webmaster Central Blog)
- SWF searchability FAQ
I will admit I referred to the articles with a critical eye; google has been flirting with retrieving some amount of content from .swfs for quite awhile. Yet for the first time, I got a sense there has been real progress.
The premise is that Google and Yahoo! spiders will access the content via an enhanced Flash player. This enhanced player will give the search engine spiders the ability to navigate within the Flash experience, and access and index associated resources.
This is an exciting prospect, as until now many site designers were resigned to duplicating the content that was available from within Flash on the HTML page wrapper that housed the Flash. This followed the web development strategy of ‘progressive enhancement‘, where a non-flash-enabled site visitor (like the Googlebot) would be able to access at least the core content, and the more capabilities the visitor possessed (CSS, rich media), the more enhanced their experience. In addition to potentially increasing maintenance costs (to ensure the two versions were in sync), implementing this method is sometimes not feasible at all, depending on the complexity of the application.
I was eager to see how what I knew about Flash accessibility best practices came into play, and eagerly read through the documentation. As I did so, however, I found I had more questions than answers. In the Google Webmaster Central Blog, there is an intriguing statement:
we do not generate any anchor text for Flash buttons which target some URL, but which have no associated text.
When I first read this, I believed it meant that some links may not be followed. This makes sense from the standpoint that a button with no associated text would essentially be a hidden link, and following it may inaccurately represent the content of the site. However, the statement actually focuses on the generation of anchor text. I am not clear where this generation would take place; perhaps in a virtual buffer of all the Flash content? How does the content of the link (assuming that it DOES get followed) get associated with the overall Flash content (since there is no anchor text).
Another consideration is the use of tabindices. When coding Flash for accessibility, tabindices may be used to specify reading order. Is this something that search engine spiders will be aware of? Equally, there is a recommendation in the Google docs to “consider replacing the text within an image.. [to make] ..less informative content.. invisible to [Google]”.
This statement made me question of the sophistication of this enhanced player. For years, Google has managed to determine that items such as copyright statements are not significant content items. So why now are they unaware of this fact now that the content is coming from a .swf? The recommendation to move content from an accessible to an inaccessible form seems terribly shortsighted and irresponsible.
We are now quite sophisticated in using semantic markup for html pages to offer search engine spiders some information about the relative importance of elements.I can only assume that all text being pulled from a Flash element is given equal weighting. If this is the case, as is noted in the Adobe Developer Center documentation we will certainly need to see “best practices emerge over time for creating SWF content that is more optimized for search engine rankings”.
Another major challenge in opening applications up to search is being able to direct the searcher to the relevant section within the experience. This is also a concern with accessible PDFs. Much of the documentation recommended the use of deep-linking. However, it’s not clear to me how the spider is made aware of these deep-links. I will admit that my own exposure to deep-linking with a flash experience is limited: we did this for the People’s Choice Awards site, where querystring parameters were fed into the .swf using flashVars. While the Adobe Developer Center documentation mentions this practice (“you can create multiple HTML files that provide different variables to the SWF and start your application at the correct subsection”), I hadn’t been aware that google supported variables in their search result URLs…
There was also some mention made that external files linked to from within the .swf will be indexed, but separately. The implication is that the contents of a data file will show up in search results, separate from its presentational format (and overall context). While I assume this will be resolved in future releases, a diligent developer will likely want to ensure their “include” files are not accessed on their own. I believe my colleagues did something similar when we launched the Wal-Mart Halloween Flash/HTML Hybrid site last year. They did some great work with deep-linking and history management, and handled orphan content loading (I refer anyone interested in the specifics to Toby Miller). My concern is that based on how this functionality was announced (that developers did not need to do anything for their swfs to be indexed), there will be little motivation to ensure content is always delivered in the proper context.
Obviously, I am very interested to see if this development will enhance the experience of users of assistive technologies. Sadly, I’m not sure it will, as the major breakthrough has been made with the enhanced player. Unless Adobe also plans to work with makers of assistive technologies, I don’t know that any of these benefits will be realized. If anything, site designers may stop some of their earlier practices (textual alternatives).
I’m very interested to know if any of the accessibility properties and best practices have made it into this enhanced search — how great would it be if the use of these properties increased the weighting of content!