Crawlable Ember Apps


#1

I was wondering what approaches people were taking to make ember apps crawlable by search engines. I know discuss is suppose to be crawlable, but looking through the source, it wasn’t obvious to me what they did to make it crawlable. I’d be interested in hearing about discuss’s approach and any others.


#2

I don’t have the answer, but I thought I’d add my two-penneth. I’ve not looked into it too far, because I only use Ember for one public-facing website, all the others are private applications which require a login.

I remember reading a while ago that Google compiles JavaScript if the JavaScript is all on one page (inline JavaScript), and so Googlebot doesn’t have to go and grab external resources to compile it. The experiment I was reading consisted of semi-complex JavaScript, but I am not at all sure on how Google handles Handlebars. I’d be very interested to see if there are any experiments relating to Ember being compiled if it’s inline. That doesn’t mean we have to write all of our Ember using inline JavaScript, since we could use a Ruby/PHP script to inject it into the index file.

However, I’ve created an application in the past with limited time to concern myself with the non-JavaScript version, but I simply wanted it crawlable. For that I used PhantomJS to take the URLs (where they don’t include a hash) and then compiled the JavaScript using PhantomJS into a static HTML file, which would then be cached for a predetermined duration before being generated again upon request. If a genuine user visited the non-hashed URL (such as from Google), then I would add the hash back into the URL and the user would be forwarded to the Ember version.

With all that said, I’m far from an expert when it comes to this, and hopefully others with more experience can add their wisdom.


#3

As @sam said in this post, Discourse uses noscript tag to make the content of the posts/topics available to crawlers.


#4

Not that anyone outside of the developer community cares, but this is really a google problem. I mean it’s our responsibility that our apps are crawl-able, but… I think they realize that and are working on some secret new crawling techniques.

Do a search for pangratz’s emberjs dashboard and you’ll see recent results previewed in the search results.

Also search google for this form, the previews show the emberjs app and not the noscript version: http://cl.ly/image/1H2E0M131w1j


#5

You need to make your back-end support every URL that your Ember.Router does. Do a “view source” on different pages of discuss.emberjs.com, and you’ll see that each page serves static HTML that matches completely with the content you see on your screen. This HTML needs to have real links to the next pages.

Take the source code of http://discuss.emberjs.com/ for example. It contains HTML like this:

...
<div class="topic-list">
<a href="/t/welcome-to-the-ember-js-discussion-forum/185">Welcome to the Ember.js Discussion Forum</a> <span title='posts'>(4)</span><br/>
<a href="/t/how-to-add-child-in-ember-data/470">How to add child in ember-data?</a> <span title='posts'>(2)</span><br/>
<a href="/t/todomvc-based-getting-started-guide/433">TodoMVC-based Getting Started Guide</a> <span title='posts'>(2)</span><br/>
...

If you check the source for Welcome to the Ember.js Discussion Forum, you can see that it also contains the actual post content, author name, and everything else.

...
<div class='creator'>
    #1 By: <b>Tom Dale</b>, March 11th, 2013 12:23
  </div>
  <div class='post'>
    <p>Welcome to the Ember.js discussion forum.</p>

<p>We're running on <a href="http://www.discourse.org/" rel="nofollow">the open source, Ember.js-powered Discourse forum software</a>. They are also providing the hosting for us. Thanks guys!</p>
...

All this HTML should be wrapped in a <noscript>...</noscript> tag, so normal users with modern browsers only see the Javascript generated stuff. But the search engines will only look at what’s inside the noscript tag, which is how they are able to crawl your site.


#6

My plan, unless I figure out something better:

  • use Phantom JS to crawl my site for me and produce static files; and
  • in a Spring servlet, use the Robots database to serve the static files to known bots instead of the AJAX site.

pjscrape looks like it might come in handy.

Aside from this being an awful lot of work to do for SEO, does this sound possible? Obviously it would be automated.


#7

I developed a tool that assists in the creation of snapthots HTML dynamically in real time.

I have used this tool in all applications using ember and I have no problem with indexing on google and bing.

to use it in ember is need to change the location of the route to hash

tool ajax-seo

###Example:

(function() {

var get = Ember.get, set = Ember.set;

Ember.Location.registerImplementation('hashbang', Ember.HashLocation.extend({   

  getURL: function() {
    return get(this, 'location').hash.substr(2);
  },

  setURL: function(path) {
    get(this, 'location').hash = "!"+path;
    set(this, 'lastSetURL', "!"+path);
  },

  onUpdateURL: function(callback) {
    var self = this;
    var guid = Ember.guidFor(this);

    Ember.$(window).bind('hashchange.ember-location-'+guid, function() {
      Ember.run(function() {
        var path = location.hash.substr(2);
        if (get(self, 'lastSetURL') === path) { return; }
        set(self, 'lastSetURL', null);
        callback(location.hash.substr(2));
      });
    });
  },

  formatURL: function(url) {
    return '#!'+url;
  }

  })
);

})();
```

```
App.Router.reopen({
  location: 'hashbang',
});
```

#8

Thanks, Alex. That looks really interesting. I might be able to use it. A couple of questions:

  1. I’m confused by all the hashbang stuff… is this meant for an app that uses HTML5 history?

  2. How do you consider it to be usable (there’s no license information)? Does that mean this applies? :wink:


#9

The solution was created based on the documentation of google, but you can use without hashbang

https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

You can use the ajax-seo freely, had forgotten to put the license, just add the license mit


#10

@stusalsbury tested the proposed solution?


#11

No, I’m “frying some other fish” right now. I will certainly let you know when I get to this. I’m glad to know that it should work for non-hashed URLs.


#12

Any questions you may have just talk.

thanks.


#13

This is something I’m going to need to also be thinking about very soon. Glad to see this post! @alexsferreira the app looks very useful, I’ll be in touch if I have any questions. Thanks again.


#14

@Scott_Baggett any difficulty using the ajax-seo contact.

best regards


#15

An example of running an application using emberJs created with emberGen and indexed with the help of seoJS.

Application Website http://seojs.alexferreira.eti.br/

See the application already beginning to be indexed on google. http://goo.gl/jIA1F

curiously open any link of google results and viewing the source code.


#16

And to run the SeoJS on server? like dreamhost :s sorry for the dull question


#17

If you have ssh access to the dreamhost can run, just that u download the package phantom js on the server and run the seoJs with him.

remembering that I have not tested it on dreamhost but I do not have problems with using.


#18

The way I approached this was to extend HashLocation to get a hashbang style url. Then used nginx to forward to an instance of phantomjs running a custom script that would call my app and load the page, strip script tags and other junk that isn’t needed, cache it and return the results when they were in the form of http://www.myapp.com/?escaped_fragment . Works fairly well.

This also works very well for linking dynamic pages with the right opengraph info on facebook as they use the #! same scheme as google.


#19

We provide a service to simplify the process of snapshot creation, host them and serve them to search engine bots.

Works with hashbang or pushState (adding in your tag) and provide some interesting features like :

  • possibility to specify a HTTP code that should be returned to bots for crawled routes/paths (specially useful for 404)
  • site crawling in advance and regularly, to always serve updated captures without the constraint of real-time and so long response times
  • mechanism to detect when the page is ready to capture automatically or programmatically (through a css selector or a callback)

This may help you if you need a solution to index your ember spa. We are in beta so the service is free to use during this period. Do not hesitate to have a look at it!

More information here: http://www.seo4ajax.com


#20

I have written a blog post about how I do it with my blog, which is a simple ember app: Making your ajax webapp crawlable

My approach uses a lot of approaches you folks are talking about. My case is pretty straightforward; I don’t really need to render the pages using phantom or anything. But if you needed to do such things, the concepts in my post would still definitely apply.

One thing I haven’t written about yet is using a sitemap.xml to hint at El Goog about what sites to crawl and how often it might get updated.