Sorting And Performance In Apex
./img/joys-of-apex-thumbnail.png
Recently, a reader came to me, wanting to discuss the potential performance implications of using the DML mocking framework I have been espousing here:
Please stop claiming performance on something that isn't doing the same thing as FFLib. Additionally, in test (sic) you're not sorting, so using your Crud implementation to actually insert is going to be slower than directly inserting, anywhere from n log n to n^2 slower.
I always find it endearing when the formulas come out. As a small business owner, I want to ensure that my clients are getting the best in the way of performance; to some extent, that's what I'm selling, but beyond that, I also consider it a moral obligation to separate feelings from code. My point is not to recommend things to people because I feel they are right. Rather, the whole point of this series has been to show using TDD and numbers that if you want big improvements in your test time, you can follow the methodology I'm describing (more so than any actual piece of code) to reduce the length of time it takes for your unit tests to run.
When I talk about the use of the Factory pattern as a way to use dependency injection to give your classes access to DML methods that can then be easily mocked, my intent is to show one possible way to utilize this methodology. As a good friend of mine likes to say:
The best line of code is the one never written.
So ... don't make sweeping changes if you don't have to. Empower yourself through an education in what's possible to decide how to best make positive changes in your/your team's development. Reducing testing time (thus speeding iteration) is one possible means to that end.
With that being said, I thought that beyond the language this reader in question chose to employ, their underlying assertion was an interesting one, and something that should be tested. To refresh your memory, I'll post just the relevant snippets:
1link//these days, on a greenfield project,
2link//I'd really call this class DML
3linkpublic virtual class Crud implements ICrud {
4link @testVisible private static Integer MAX_DML_CHUNKING = 10;
5link //....
6link public virtual List<SObject> doInsert(List<SObject> records) {
7link this.sortToPreventChunkingErrors(records);
8link Database.insert(records);
9link return records;
10link }
11link //etc, sortToPreventChunkingErrors is called
12link //in update/upsert as well
13link
14link private void sortToPreventChunkingErrors(List<SObject> records) {
15link //prevents a chunking error that can occur if
16link //SObject types are in the list out of order.
17link //no need to sort if the list size is below the limit
18link if(records.size() >= MAX_DML_CHUNKING) {
19link records.sort();
20link }
21link }
22link}
But is the production usage of Crud
actually causing O(n log n)
(linearithmic) to O(2^n)
(quadratic) slowness? This seems like something that we can test fairly easily. Of course, to test "in production" (mais c'est en fait "in situ", non?), we have access to the wonderful world of Anonymous Apex. Let's build something that allows us to test the reader's assertion, getting a baseline performance indication from what happens when calling Database.insert
in order to compare against the Crud
implementation.
Our pseudo test-runner needs the following qualities:
It's also not enough to test one of the Crud
methods against only the baseline Database.insert
method; I also need to update the implementation to make sorting optional so that I can measure whether or not adding sorting significantly affects processing time:
1linkpublic virtual class Crud implements ICrud {
2link public static Boolean SORT_CHUNKS = false;
3link //....
4link private void sortToPreventChunkingErrors(List<SObject> records) {
5link //prevents a chunking error that can occur
6link //if SObject types are in the list out of order.
7link //no need to sort if the list size is below the limit\
8link if(SORT_CHUNKS && records.size() >= MAX_DML_CHUNKING) {
9link records.sort();
10link }
11link }
12link}
Alright, let's look at the Anonymous Apex:
1linkpublic abstract class Function {
2link private Datetime now = System.now();
3link
4link protected abstract void call();
5link protected abstract String getTypeName();
6link
7link public Function() {
8link Savepoint sp = Database.setSavePoint();
9link this.recordTime('Starting for ' + getTypeName());
10link this.call();
11link this.recordTime('Ending for ' + getTypeName());
12link Database.rollBack(sp);
13link }
14link
15link protected List<Account> getAccounts() {
16link List<Account> accounts = new List<Account>();
17link //despite each Database.rollback
18link //the DML rows still count towards the max
19link //of 10,000 DML rows per sync transaction
20link //and we need to give breathing room
21link //for the max of 10 seconds
22link //per Anonymous Apex transaction
23link for(Integer index = 0; index < 2000; index++) {
24link accounts.add(
25link new Account(
26link Name = 'Test' + Math.random(),
27link NumberOfEmployees = index,
28link Phone = String.valueOf(index),
29link Sic = '57340',
30link YearStarted = String.valueOf((Math.random() * 4).intValue())
31link )
32link );
33link }
34link return accounts;
35link }
36link
37link protected void recordTime(String startString) {
38link System.debug(startString + ': ' + getSecondsPassed() + ' seconds elapsed');
39link now = System.now();
40link }
41link
42link private Integer getSecondsPassed() {
43link return Datetime.newInstance(System.now().getTime() - now.getTime()).second();
44link }
45link}
46link
47linkpublic class DatabaseInsert extends Function {
48link protected override String getTypeName() { return 'Database.insert'; }
49link
50link protected override void call() {
51link Database.insert(this.getAccounts());
52link }
53link}
54link
55linkpublic class DMLFunctionWithSorting extends Function {
56link protected override String getTypeName() { return 'Crud (with sorting)'; }
57link
58link protected override void call() {
59link Crud.SORT_CHUNKS = true;
60link new Crud().doInsert(this.getAccounts());
61link }
62link}
63link
64linkpublic class DMLFunctionNoSorting extends Function {
65link protected override String getTypeName() { return 'Crud (no sorting)'; }
66link
67link protected override void call() {
68link Crud.SORT_CHUNKS = false;
69link new Crud().doInsert(this.getAccounts());
70link }
71link}
72link
73linknew DatabaseInsert();
74linknew DMLFunctionWithSorting();
75linknew DMLFunctionNoSorting();
A few notes on the Function
paradigm for the footnotes, and running that whole block in Anonymous Apex yields:
The Function
paradigm was inspired by an article I read recently on fluent iterators, which is well worth the read. I've been digging through their source code and having fun with it. There's so much good food for thought in their codebase, which is linked in the article! I've also since published a post about Lazy Iterators which you may also find interesting.
1link$ USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0 seconds elapsed
2link$ USER_DEBUG [24]|DEBUG|Ending for Database.insert: 9 seconds elapsed
3link$ USER_DEBUG [24]|DEBUG|Starting for Crud (with sorting): 0 seconds elapsed
4link$ USER_DEBUG [24]|DEBUG|Ending for Crud (with sorting): 8 seconds elapsed
5link$ USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0 seconds elapsed
6link$ USER_DEBUG [24]|DEBUG|Ending for Crud (no sorting): 8 seconds elapsed
Très intéressant, as they say. I ran this snippet a few times just to make sure that my eyes weren't deceiving me — with the same results.
To get better granularity (and to prevent rounding errors) on the seconds elapsed, let's swap out those getSecondsPassed
Integers for Decimals:
1linkprotected void recordTime(String startString) {
2link System.debug(startString + ': ' + getSecondsPassed().format() + ' seconds elapsed');
3link now = System.now();
4link}
5link
6linkprivate Decimal getSecondsPassed() {
7link Integer millisecondsPassed = Datetime.newInstance(
8link System.now().getTime() - now.getTime()
9link ).millisecond();
10link return Decimal.valueOf(millisecondsPassed) / 60;
11link}
A few notes on the Function
paradigm for the footnotes, and running that whole block in Anonymous Apex yields:
This is true for many languages, but you have to run things many times in order to get something approximating consistent results. I ran just the decimal formatted test more than 30 times, and the test results varied wildly between runs. I consistently found that the native insert method lagged behind the Crud method, with or without sorting, but at times the seconds difference between each invocation varied so much as to render the results unstudyable; at times, the sorting method was fastest. This is the infuriating nature of data science - trying to achieve consistency across things that are by nature inconsistent.
1link$#test run 32 ...
2link$USER_DEBUG [32]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
3link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 11.783 seconds elapsed
4link$USER_DEBUG [32]|DEBUG|Starting for Crud (with sorting): 0.017 seconds elapsed
5link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 6.833 seconds elapsed
6link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0.017 seconds elapsed
7link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 6.45 seconds elapsed
8link$#test run 31 ...
9link$USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
10link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 5.15 seconds elapsed
11link$USER_DEBUG [24]|DEBUG|Starting for Crud (with sorting): 0.017 seconds elapsed
12link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 8.317 seconds elapsed
13link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0 seconds elapsed
14link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 4.867 seconds elapsed
15link$#test run 30 ...
16link$USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
17link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 11.817 seconds elapsed
18link$USER_DEBUG [24]|DEBUG|Starting for Crud (with sorting): 0 seconds elapsed
19link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 1.933 seconds elapsed
20link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0 seconds elapsed
21link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 3.783 seconds elapsed
22link$#test run 29 ...
23link$USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
24link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 2.15 seconds elapsed
25link$USER_DEBUG [32]|DEBUG|Starting for Crud (with sorting): 0.017 seconds elapsed
26link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 11.867 seconds elapsed
27link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0.017 seconds elapsed
28link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 4.783 seconds elapsed
29link$#test run 28 ...
30link$USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
31link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 14.067 seconds elapsed
32link$USER_DEBUG [24]|DEBUG|Starting for Crud (with sorting): 0 seconds elapsed
33link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 4.533 seconds elapsed
34link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0 seconds elapsed
35link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 13.833 seconds elapsed
36link$#test run 27
37link$USER_DEBUG [24]|DEBUG|Starting for Database.insert: 0.017 seconds elapsed
38link$USER_DEBUG [32]|DEBUG|Ending for Database.insert: 3.15 seconds elapsed
39link$USER_DEBUG [24]|DEBUG|Starting for Crud (with sorting): 0 seconds elapsed
40link$USER_DEBUG [32]|DEBUG|Ending for Crud (with sorting): 0.4 seconds elapsed
41link$USER_DEBUG [24]|DEBUG|Starting for Crud (no sorting): 0.017 seconds elapsed
42link$USER_DEBUG [32]|DEBUG|Ending for Crud (no sorting): 1.417 seconds elapsed
43link$#.... etc
I could keep going. Anybody can run these test at home though, so I hardly think it necessary. Out of my 30+ sample size test runs, let's take the 6 results above and do some tabulation & classic statistical modeling:
Method | Tabulation | Value |
---|---|---|
Database.insert | Average | 8.0195 |
Database.insert | Standard Deviation | 5.12941099 |
Crud (sorting) | Average | 5.647166667 |
Crud (sorting) | Standard Deviation | 4.237680895 |
Crud (no sorting) | Average | 5.8555 |
Crud (no sorting) | Standard Deviation | 4.245394293 |
The 95% confidence interval (2.3723 ± 6.05222962) comparing Database.insert
to the sorting method results in a range from 0 - 8.42456362 with a standard error of 2.72. The sort
method isn't even close to operating in quadratic / linearithmic time. Given that the sorting average is within the first standard deviation for not sorting, most statisticians would probably just throw their hands up and give up the comparison entirely.
In other words, there's no significant performance degradation with the use of the baseline Crud
library. Indeed, in the vast majority of instances, it compares favorably with the baseline implementation, even if sorting is used. Statistically speaking, any performance degradation (if it did occur) isn't even close to significance, but generally speaking, it's likely that your production performance bottleneck isn't going to be determined by how you insert/update/upsert things; it's going to be determined by how efficient the handler(s) you have that are called by said actions.
To be clear — you can do a lot more with the FFLib library. My point in writing this whole series has not been to villify FFLib — it's been to contend that if the vast majority of testing time is sunk into DML, you might not need to mock every other dependency that you have, and that if the use of those mocks doesn't actually measurably speed up your testing time, you'd be better off not using the library at all. Merely mocking queries/DML will save you huge increments of time with a much smaller mocking footprint. Focusing on small changes that produce big wins is a proven success strategy, and I hope that you'll be encouraged by these results to think about how you can optimize your/your team's development accordingly.
Using an example relying on a string field, as per the original answer on the Salesforce Stack Exchange, is brilliant, by the way. Most other fields you'd want to sort on, assuming that your custom sorting depended on only one field at all, don't have access to a built-in method to provide the 1, -1, or 0. There's no "compareTo" for Dates, or Datetimes, for example.
One of the big takeaways from the above is that the default sorting implementation in Apex, at least for SObjects, is absurdly performant. That's really good news. But what happens if you need to sort your SObjects very specifically?
Moving on to another interesting subject — custom sorting. Some of you may recall that in the footnotes for the Joys Of Apex post on Enums, I spoke about the Comparable
interface in Apex, part of the standard library for Apex. To review, classes looking to implement Comparable
have to conform to the following contract:
1link//where the Integer returned
2link//should be either:
3link//-1 for less than, 1 for greater than
4link//or 0 for functionally equivalent
5linkglobal Integer compareTo(Object compareTo);
There's a fairly well-viewed Salesforce stack exchange post on creating a Comparator class, the accepted answer for which features an .... object-oriented-lite ... version of a way to easily implement custom sorting within Apex. I'm going to show you the code that is suggested there, and then massage it to be more idiomatic. On the subject of performance, however, both implementations feature placeholder-lists, which is presumably the source for much of the performance overhead when it comes to using a custom sorter.
The original answer:
1link//I've posted the entirety of the response, unedited:
2linkpublic abstract class Comparator {
3link public abstract Integer compare(Object o1, Object o2);
4link public static void sort(Object[] values, Comparator comp) {
5link // Obtain the list type of values
6link Object[] temp = values.clone();
7link temp.clear();
8link // Helper class for sorting using Comparable
9link Helper[] tempValues = new Helper[0];
10link for(Object value: values) {
11link tempValues.add(new Helper(comp, value));
12link }
13link // Perform sort
14link tempValues.sort();
15link // Extract values back into temp list
16link for(Helper helper: tempValues) {
17link temp.add(helper.value);
18link }
19link // And set the list to the new, sorted order
20link values.clear();
21link values.addAll(temp);
22link }
23link // Simply calls Comparator when asked.
24link class Helper implements Comparable {
25link Comparator method;
26link Object value;
27link Helper(Comparator comp, Object val) {
28link method = comp;
29link value = val;
30link }
31link public Integer compareTo(Object o) {
32link return method.compare(value, ((Helper)o).value);
33link }
34link }
35link}
36link
37link//From here, you can create your own solutions:
38link
39linkpublic class AccountNameComparator extends Comparator {
40link public override Integer compare(Object a, Object b) {
41link return ((Account)a).name.compareTo(((Account)b).name);
42link }
43link}
44link
45link//Which would let you sort as you like:
46linkAccount[] accounts = [SELECT Name FROM Account ORDER BY CreatedDate LIMIT 5];
47linkComparator.sort(accounts, new AccountNameComparator());
One of the things that I try to stress here is the importance of naming. Using names like Helper
doesn't tell anybody what you're up to. Likewise, the use of a static method on Comparator
is kind of a bummer. Let's encapsulate the sorting behavior, rename things, and expose a better interface using the same example of sorting by Account name [^5]:
1linkpublic abstract class Comparator {
2link public abstract Integer compare(Object o1, Object o2);
3link
4link public void sort(Object[] values) {
5link Object[] temp = new List<Object>();
6link ItemWrapper[] wrappedItems = new List<ItemWrapper>();
7link
8link for(Object value: values) {
9link wrappedItems.add(new ItemWrapper(this, value));
10link }
11link
12link wrappedItems.sort();
13link
14link for(ItemWrapper item: wrappedItems) {
15link temp.add(item.value);
16link }
17link
18link values.clear();
19link values.addAll(temp);
20link }
21link
22link private class ItemWrapper implements Comparable {
23link private final Comparator comparer;
24link private final Object value;
25link
26link public ItemWrapper(Comparator comparer, Object value) {
27link this.comparer = comparer;
28link this.value = value;
29link }
30link
31link public Integer compareTo(Object o) {
32link return comparer.compare(value, ((ItemWrapper)o).value);
33link }
34link }
35link}
36link
37link//same as before
38linkpublic class AccountNameComparator extends Comparator {
39link public override Integer compare(Object a, Object b) {
40link return ((Account)a).Name.compareTo(((Account)b).Name);
41link }
42link}
43link
44link//much cleaner
45linkAccount[] accounts = [SELECT Name FROM Account ORDER BY CreatedDate LIMIT 5];
46linknew AccountNameComparator().sort(accounts);
By passing the Comparator
instance using the this
keyword, we completely eliminate the static method and instead allow consumers of custom sorting to simply define their sorting algorithm in a class extending Comparator
prior to calling the sort
method with their list. That's properly object-oriented.
Let's revisit our Function
class to observe whether or not custom sorting is really going to bite us in production:
1link//with getAccounts() set to return 10,000 rows
2linkpublic class BaselineSorting extends Function {
3link protected override String getTypeName() { return 'Baseline sorting'; }
4link
5link protected override void call() {
6link List<Account> accounts = this.getAccounts();
7link accounts.sort();
8link }
9link}
10link
11linkpublic class CustomSorting extends Function {
12link protected override String getTypeName() { return 'Custom sorting'; }
13link
14link protected override void call() {
15link List<Account> accounts = this.getAccounts();
16link new AccountNameComparator().sort(accounts);
17link }
18link}
19link
20linknew BaselineSorting();
21linknew CustomSorting();
Answer — it depends, but probably yes. Again, there's quite a bit of variance in run times:
1link$#Run 5
2link$USER_DEBUG [24]|DEBUG|Starting for Baseline sorting: 0.017 seconds elapsed
3link$USER_DEBUG [24]|DEBUG|Ending for Baseline sorting: 5.067 seconds elapsed
4link$USER_DEBUG [24]|DEBUG|Starting for Custom sorting: 0 seconds elapsed
5link$USER_DEBUG [24]|DEBUG|Ending for Custom sorting: 15.25 seconds elapsed
6link$
7link$#Run 4
8link$USER_DEBUG [24]|DEBUG|Starting for Baseline sorting: 0.017 seconds elapsed
9link$USER_DEBUG [24]|DEBUG|Ending for Baseline sorting: 4.617 seconds elapsed
10link$USER_DEBUG [24]|DEBUG|Starting for Custom sorting: 0.017 seconds elapsed
11link$USER_DEBUG [24]|DEBUG|Ending for Custom sorting: 16.017 seconds elapsed
12link$
13link$#Run 3 (what happened here?!)
14link$USER_DEBUG [24]|DEBUG|Starting for Baseline sorting: 0.017 seconds elapsed
15link$USER_DEBUG [24]|DEBUG|Ending for Baseline sorting: 5.433 seconds elapsed
16link$USER_DEBUG [24]|DEBUG|Starting for Custom sorting: 0.017 seconds elapsed
17link$USER_DEBUG [24]|DEBUG|Ending for Custom sorting: 1.317 seconds elapsed
18link$
19link$#Run 2
20link$USER_DEBUG [24]|DEBUG|Starting for Baseline sorting: 0 seconds elapsed
21link$USER_DEBUG [24]|DEBUG|Ending for Baseline sorting: 4.8 seconds elapsed
22link$USER_DEBUG [24]|DEBUG|Starting for Custom sorting: 0.017 seconds elapsed
23link$USER_DEBUG [24]|DEBUG|Ending for Custom sorting: 14.467 seconds elapsed
24link$
25link$#Run 1
26link$USER_DEBUG [24]|DEBUG|Starting for Baseline sorting: 0 seconds elapsed
27link$USER_DEBUG [24]|DEBUG|Ending for Baseline sorting: 6.533 seconds elapsed
28link$USER_DEBUG [24]|DEBUG|Starting for Custom sorting: 0 seconds elapsed
29link$USER_DEBUG [24]|DEBUG|Ending for Custom sorting: 15.767 seconds elapsed
That being said, who is actually sorting 10k Accounts at a given time? When I'm using custom sorting, it's typically on a couple hundred records at a given time. The performance cost may well be negligible for numbers like that — if the above example takes .283 seconds to sort 200 Accounts, you might not even notice the difference. Again, in FinTech, thousandths of a second matter. In most other arenas ... they don't. Plan accordingly.
The TL;DR would probably look something like this:
Crud
is going to noticeable slow down your production instance's DML statements is demonstrably falseThanks for reading the latest in the the Joys Of Apex. I hope you enjoyed — till next time!
The original version of Sorting & Performance In Apex can be read on my blog.
Edit:
The same reader pointed out that the list I was using in the original Function
example was essentially pre-sorted, and that because SObject comparisons compare all fields supplied on the object (as well as its SObjectType
label), I was potentially misrepresenting the results displayed by virtue of only having each account initialized based off of a Name field with the string value of the list's index. It seems that the reader's first issue — that sorting large lists could lead to catastrophic performance concerns — had been transformed into the argument that performing additional comparisons was going to be inordinately costly.
I've updated the article (and the subsequent figures) to reflect Accounts inserted into the list in a far more random fashion, with more fields filled out, to show that it doesn't significantly affect performance. Indeed, the numbers are virtually the same. No matter, though — I'm sure I'll have to fire up JMeter and livestream the new quadsort algorithm before this particular person is happy 🤣
Home Apex Logging Service Apex Object-Oriented Basics Batchable And Queueable Apex Building A Better Singleton Continuous Integration With SFDX Dependency Injection & Factory Pattern Enum Apex Class Gotchas Extendable Apis Formula Date Issues Future Methods, Callouts & Callbacks Idiomatic Salesforce Apex Introduction & Testing Philosophy Lazy Iterators Lightweight Trigger Handler LWC Composable Modal LWC Composable Pagination LWC Custom Lead Path Mocking DML React Versus Lightning Web Components Refactoring Tips & Tricks Replacing DLRS With Custom Rollup Repository Pattern setTimeout & Implementing Delays Sorting And Performance In Apex Test Driven Development Example Testing Custom Permissions Writing Performant Apex Tests