Advice on handling tens of thousands of queried records

Howdy! :wave:

I actually have a few questions but the most important one: How do people deal with queries that return tens of thousands of records/models? I’m working in the medical domain and I need to query a lookup table that can return 70K+ records based on certain criteria. I’m hoping my problem isn’t unique and other folks have addressed this.

The second question: I need to use filters when querying lookup data. However I have a need to filter data after the initial query. I think I need to use peekAll but I’m not sure how to filter “after the fact”.

The third question: Is there a way I can use a combination of peekAll and query to only issue a network request when there are no targeted models in the store?

Thank you, in advance, for your guidance and time.

Since questions 2 and 3 are pretty straightforward I’ll just take a swing at those before getting into the other stuff which is a longer answer…

The second question: I need to use filters when querying lookup data. However I have a need to filter data after the initial query. I think I need to use peekAll but I’m not sure how to filter “after the fact”

Yes! You’re definitely on the right track. Filtering after the fact is usually done via a combination of peekAll and then computed properties. You could write a single filter function/CP or use all of the fun computed macros to do the filtering. Assuming you’re filtering based on UI controls or something you would probably do this in a controller or component that has the values of your inputs and the data from your peekAll (if it’s a component you could either pass the peekAll into the component or inject the store and use it inside the component. The nice thing about peekAll is that it’s a live array so you can make a CP chain like so:

things: computed(function() {
  return this.store.peekAll('some-model');
}),
// returns all "things" with property "truthy" equal to true, updates as the things live-array changes
onlyTrueThings: filterBy('things', 'truthy', true);

otherFilteredThings: computed('things.@each.{name,style}', function() {
 // return things filtered by name and/or style  
})

The third question: Is there a way I can use a combination of peekAll and query to only issue a network request when there are no targeted models in the store?

If I’m understanding the need here (filter what’s in the store down and if there are no records that meet the filter run a new query) then yes, you could definitely do this. The only part that may be more complex is how to trigger the query. That will kind of depend on your architecture. You could do the filtering in the route and use that to determine whether or not to requery but if the filters are determined by your UI that doesn’t make much sense. You could also do it in a component but then you have to handle the async query states properly, perhaps using ember-concurrency.

But anyway that brings us back around to the first question…

How do people deal with queries that return tens of thousands of records/models?

In my experience (and take this with a grain of salt because this is anecdotal) you probably want to consider not using Ember Data for this particular query/record set. It’s not necessarily obvious from the guides and general community and documentation but Ember Data is not only not required but it’s also not recommended for some things. I worked for a company that made trading software for a while and we stored lots of market data records in ember data (could easily get into the tens of thousands of records, sometimes hundreds of thousands) and while it actually performed pretty well in general on desktop we definitely had a lot of problems keeping it in check and lower power devices really struggled. Memory is probably the biggest thing at that scale. Each ember data record has its own little state machine and keeps a lot of data other than just the model attributes and relationships that we all know and love. This adds up pretty quickly when you’re talking about that many records. There have also been issues, historically at least, safely and effectively unloading records from the store.

Ember Data is great for a lot of things, especially stuff like the canonical blog apps or apps where you’re doing full CRUD operations on heavier data records, but that comes with a tradeoff for situations like yours. Especially if you don’t need much of the “state machine” part of Ember Data or this data won’t be making heavy use of relationships, I think it would make a lot more sense to keep your data outside of Ember Data.

So let’s say you decide not to use Ember Data… there are a lot of directions you could go… you could return your data from a route model hook just the same as usual. If you want to bind the query to query params that might make the most sense. If not though I’d say consider a service and potentially an ember-concurrency task (will make the UI portion simpler if nothing else) or a method that fetches the data via a more standard XHR request and returns a promise or promise proxy. Then you can have filter properties and CPs directly on the service. The advantage to this direction is that all of the logic is contained in one place and you can inject it into any route/controller/component that you need.

Anyway it’s hard and probably confusing to go into more details without more context but my personal recommendation would be to really consider what parts of Ember Data you need in this case and if it’s not much maybe sidestep it altogether. Then determine a broader architecture for the query and filtering needs. Definitely happy to provide thoughts (and again I want to be clear that this is just my take and others may have better feedback) or suggestions for any of the implementation.

2 Likes

As usual, lots of good stuff to think about. I’m currently researching/exploring ember-concurrency to help. I may have more questions later. :slight_smile:

I would go with a service that takes care of loading and caching the records that you need.

1 Like

Thank you for the feedback. At the moment I am kind of leaning that way. :slight_smile:

I would ask you this: if you were building your application with just HTML, how would you deal with a query that could return tens of thousands of records?

The answer has always been “pagination.” Even when you implement infinite scrolling, you’re using pagination under the hood. You should send page (& optionally perPage) params along with your query to the server.

When I was getting started with Ember, I was very confused about how to deal with query params & how they relate to query params sent to the server. As a result, when I figured it out I wrote this little addon however I think it’s better viewed as a guide for how to implement pagination with Ember & Ember Data than used as an addon.

1 Like

Keep in mind that this will only be computed once, so if you make subsequent queries to load more some-models, they won’t be present in things

1 Like

they won’t be present in things

Not true actually. peekAll returns a live-updating RecordArray so while the CP only gets computed once the result of the CP is a reference to said live array (and multiple calls to peekAll would only return the same reference anyway) and you can observe it for changes, added or removed records included.

2 Likes

oh snap! TIL, thanks :smiley:

1 Like

Yeah it’s very nice but can also be a little confusing, especially if you expect a query result array to live-update the same way (which seems to be a pretty common tripping point) because those don’t.

1 Like

Yeah I basically have said a good rule-of-thumb is to never define a dependent-less computed property because it’s just a bug waiting to happen. As a matter of fact, I just stumbled on this bug yesterday which causes new mobiledoc editors to never receive updated options :man_facepalming:

But doing peekAll is a good exception to my rule :wink:

1 Like

Paginate as @shull said.

If you’re bringing back tens of thousands of records so that you can do some aggregation of the data on the front end, I’d suggest not doing that and instead use a database VIEW on your backend to produce the results you desire. Typically a database VIEW can be configured as an immutable model+resource inside your back-end (e.g. Rails). You would then create a model in Ember Data that matches the VIEW and this may help you bring back fewer data records.

Even more advanced, you can create database VIEWS and then use your back-end (e.g. Rails) to create relationships from your existing related models to these VIEWS. This is probably a bit more complicated than what you need right now, but if you think about this for a bit you’ll probably see where I’m going with this… Feel free to ask me for more detail. Good luck!

1 Like

Sorry for the late reply, but to close the loop on the approach I finally settled on: implementing an ‘endless scroll’ feature for some drop-downs.

For the moment I’m punting on peekAll; I need to revisit because of changing priorities - per usual, eh?

Ok, the general problem is: I have a model that when one of its attributes changes it affects several lists of values for other other attributes and/or associated data. I’m using ember-power-select to implement the drop-downs and I luckily found this medium article: Lazy Loading With Ember Power Select Using Contextual Components | by Brandon Drake | Medium

I had to change my implementation to account for version differences, but when I say I got lucky, I truly got lucky. I was able to learn more about ember-concurrency, too. But, essentially, it’s pagination and a fetch more when a special list element is visible.

I’m sorry I didn’t get back sooner, to close the loop, but I want to Thank everyone for their suggestions. Thanks! :smiley: :heart:

3 Likes