🧤 🍐 📔 Spring Data JPA：好的和坏的是什么 🧑🏽‍🤝‍🧑🏽 👩🏽‍🔧 👩🏻‍🏫

小儿子来到他父亲
并问婴儿
-什么好
还有什么不好

弗拉基米尔·玛雅科夫斯基

本文是关于Spring Data JPA的，即我在途中遇到的水下耙，当然还有一些性能方面的知识。

本文中描述的示例可以在测试环境中运行，可以通过引用进行访问。

对于尚未迁移到Spring Boot 2的用户的说明

在Spring Data JPA 2的版本中，用于处理存储库的主要接口，即CrudRepository ，从中继承JpaRepository ，已经JpaRepository 。在版本1.中，主要方法如下所示：

 public interface CrudRepository<T, ID> { T findOne(ID id); List<T> findAll(Iterable<ID> ids); }

在新版本中：

 public interface CrudRepository<T, ID> { Optional<T> findById(ID id); List<T> findAllById(Iterable<ID> ids); }

因此，让我们开始吧。

从t中选择t。*，其中（。）中的t.id

最常见的查询之一是以下形式的查询：“选择键落入传输集中的所有记录”。我敢肯定你们几乎所有人都写过或看到过类似的东西

 @Query("select ba from BankAccount ba where ba.user.id in :ids") List<BankAccount> findByUserIds(@Param("ids") List<Long> ids); @Query("select ba from BankAccount ba where ba.user.id in :ids") List<BankAccount> findByUserIds(@Param("ids") Set<Long> ids);

这些是可行的，合适的请求，没有捕获或性能问题，但是存在一个很小的，完全不明显的缺点。

在打开衬管之前，请尝试自己考虑一下。

缺点是接口太窄而无法传输密钥。 “那又怎样？” -你说。 “好了，列表好了，我在这里没有问题。” 但是，如果我们查看具有许多值的根接口方法，那么到处都会看到Iterable ：

“那又怎样？我想要一张清单。为什么它更糟？”
更糟的是，只需为应用程序中更高级别的相似代码的出现做好准备：

 public List<BankAccount> findByUserId(List<Long> userIds) { Set<Long> ids = new HashSet<>(userIds); return repository.findByUserIds(ids); } // public List<BankAccount> findByUserIds(Set<Long> userIds) { List<Long> ids = new ArrayList<>(userIds); return repository.findByUserIds(ids); }

这段代码除了反转集合外什么也不做。该方法的参数可能是一个列表，并且存储库方法接受该集合（反之亦然），您可能只需要重新调用它即可通过编译。当然，在请求本身产生间接费用的背景下，这不会成为问题，更多的是不必要的手势。

因此，使用Iterable是一个好习惯：

 @Query("select ba from BankAccount ba where ba.user.id in :ids") List<BankAccount> findByUserIds(@Param("ids") Iterable<Long> ids);

Z.Y. 如果我们正在谈论*RepositoryCustom的方法，那么使用Collection来简化实现内部大小的计算是有意义的：

 public interface BankAccountRepositoryCustom { boolean anyMoneyAvailable(Collection<Long> accountIds); } public class BankAccountRepositoryImpl { @Override public boolean anyMoneyAvailable(Collection<Long> accountIds) { if (ids.isEmpty()) return false; //... } }

额外的代码：非重复密钥

在上一节的继续中，我想提请大家注意一个常见的误解：

 @Query("select ba from BankAccount ba where ba.user.id in :ids") List<BankAccount> findByUserIds(@Param("ids") Set<Long> ids);

相同错误的其他表现形式：

 Set<Long> ids = new HashSet<>(notUniqueIds); List<BankAccount> accounts = repository.findByUserIds(ids); List<Long> ids = ts.stream().map(T::id).distinct().collect(toList()); List<BankAccount> accounts = repository.findByUserIds(ids); Set<Long> ids = ts.stream().map(T::id).collect(toSet()); List<BankAccount> accounts = repository.findByUserIds(ids);

乍一看，没有什么不寻常的，对吧？

慢慢来，为自己思考；）

形式select t from t where t.field in ... HQL / JPQL查询select t from t where t.field in ...最终将变为查询

 select b.* from BankAccount b where b.user_id in (?, ?, ?, ?, ?, …)

无论参数中是否存在重复，它都会始终返回相同的内容。因此，不必确保键的唯一性。有一种特殊情况-Oracle，按> 1000键in将导致错误。但是，如果您尝试通过排除重复来减少键的数量，那么您应该考虑一下它们出现的原因。该错误最有可能在上方。

因此，在良好的代码中使用Iterable ：

 @Query("select ba from BankAccount ba where ba.user.id in :ids") List<BankAccount> findByUserIds(@Param("ids") Iterable<Long> ids);

沙门氏菌

仔细看一下这段代码，在这里发现三个缺陷和一个可能的错误：

 @Query("from User u where u.id in :ids") List<User> findAll(@Param("ids") Iterable<Long> ids);

再想一想

一切都已经在SimpleJpaRepository::findAllById
传递空列表时的空闲请求（在SimpleJpaRepository::findAllById有相应的检查）
使用@Query描述的所有查询都在生成上下文的阶段进行检查，这需要花费时间（与SimpleJpaRepository::findAllById不同）
如果使用Oracle，则当键集合为空时，我们将收到错误ORA-00936: missing expression （使用SimpleJpaRepository::findAllById时不会发生，请参见第2点）

哈利·波特与复合键

看两个示例，然后选择您喜欢的一个：

方法次数

 @Embeddable public class CompositeKey implements Serializable { Long key1; Long key2; } @Entity public class CompositeKeyEntity { @EmbeddedId CompositeKey key; }

方法二

 @Embeddable public class CompositeKey implements Serializable { Long key1; Long key2; } @Entity @IdClass(value = CompositeKey.class) public class CompositeKeyEntity { @Id Long key1; @Id Long key2; }

乍一看，没有区别。现在尝试第一种方法并运行一个简单的测试：

 //case for @EmbeddedId @Test public void findAll() { int size = entityWithCompositeKeyRepository.findAllById(compositeKeys).size(); assertEquals(size, 5); }

在查询日志中（保留它，对吧？），我们将看到以下内容：

 select e.key1, e.key2 from CompositeKeyEntity e where e.key1 = ? and e.key2 = ? or e.key1 = ? and e.key2 = ? or e.key1 = ? and e.key2 = ? or e.key1 = ? and e.key2 = ? or e.key1 = ? and e.key2 = ?

现在第二个例子

 //case for @Id @Id @Test public void _findAll() { int size = anotherEntityWithCompositeKeyRepository.findAllById(compositeKeys).size(); assertEquals(size, 5); }

查询日志看起来不同：

 select e.key1, e.key2 from CompositeKeyEntity e where e.key1=? and e.key2=? select e.key1, e.key2 from CompositeKeyEntity e where e.key1=? and e.key2=? select e.key1, e.key2 from CompositeKeyEntity e where e.key1=? and e.key2=? select e.key1, e.key2 from CompositeKeyEntity e where e.key1=? and e.key2=? select e.key1, e.key2 from CompositeKeyEntity e where e.key1=? and e.key2=?

这就是整个区别：在第一种情况下，我们总是收到1个请求，在第二种情况下，我们总是收到n个请求。
此行为的原因在于SimpleJpaRepository::findAllById ：

 // ... if (entityInfo.hasCompositeId()) { List<T> results = new ArrayList<>(); for (ID id : ids) { findById(id).ifPresent(results::add); } return results; } // ...

哪种方法最好，是您根据请求数量的重要性来确定的。

Extra CrudRepository ::保存

通常在代码中有这样的反模式：

 @Transactional public BankAccount updateRate(Long id, BigDecimal rate) { BankAccount account = repo.findById(id).orElseThrow(NPE::new); account.setRate(rate); return repo.save(account); }

读者感到困惑：反模式在哪里？这段代码看起来非常合乎逻辑：我们得到了实体-更新-保存。一切都像在圣彼得堡最好的房子里。我敢说在这里调用CrudRepository::save是多余的。

首先： updateRate方法updateRate事务性的，因此，Hibernate会跟踪受管实体中的所有更改，并在执行Session::flush时将其转换为请求，该代码在方法结束时发生。

其次， CrudRepository::save看一下CrudRepository::save方法。如您所知，所有存储库都基于SimpleJpaRepository 。这是CrudRepository::save的实现CrudRepository::save ：

 @Transactional public <S extends T> S save(S entity) { if (entityInformation.isNew(entity)) { em.persist(entity); return entity; } else { return em.merge(entity); } }

有一个微妙之处，并非所有人都记得：Hibernate通过事件进行工作。换句话说，每个用户操作都会生成一个事件，该事件将考虑到同一队列中的其他事件进行排队和处理。在这种情况下，对EntityManager::merge的调用EntityManager::merge生成一个MergeEvent ，默认情况下，将在DefaultMergeEventListener::onMerge 。对于实体参数的每个状态，它包含一个相当分支但简单的逻辑。在我们的案例中，实体是从事务方法内部的存储库中获得的，并且处于PERSISTENT状态（即，基本上由框架控制）：

 protected void entityIsPersistent(MergeEvent event, Map copyCache) { LOG.trace("Ignoring persistent instance"); Object entity = event.getEntity(); EventSource source = event.getSession(); EntityPersister persister = source.getEntityPersister(event.getEntityName(), entity); ((MergeContext)copyCache).put(entity, entity, true); this.cascadeOnMerge(source, persister, entity, copyCache); //<---- this.copyValues(persister, entity, entity, source, copyCache); //<---- event.setResult(entity); }

细节在于细节，即方法DefaultMergeEventListener::cascadeOnMerge和DefaultMergeEventListener::copyValues 。让我们听听Hibernate的主要开发人员之一Vlad Mikhalche的直接讲话：

在copyValues方法调用中，再次复制了水合状态，因此冗余创建了一个新数组，因此浪费了CPU周期。如果实体具有子关联，并且合并操作也从父实体级联到子实体，则开销会更大，因为每个子实体都将传播MergeEvent，并且循环继续。

换句话说，正在完成您无法完成的工作。结果，可以简化我们的代码，同时提高其性能：

 @Transactional public BankAccount updateRate(Long id, BigDecimal rate) { BankAccount account = repo.findById(id).orElseThrow(NPE::new); account.setRate(rate); return account; }

当然，在开发和校对其他人的代码时记住这一点JpaRepository::save ，因此我们希望在线框级别进行更改，以使JpaRepository::save方法JpaRepository::save失去其有害的属性。这可能吗？

是的也许

 // @Transactional public <S extends T> S save(S entity) { if (entityInformation.isNew(entity)) { em.persist(entity); return entity; } else { return em.merge(entity); } } // @Transactional public <S extends T> S save(S entity) { if (entityInformation.isNew(entity)) { em.persist(entity); return entity; } else if (!em.contains(entity)) { return em.merge(entity); } return entity; }

实际上，这些更改是在2017年12月进行的：
https://jira.spring.io/browse/DATAJPA-931
https://github.com/spring-projects/spring-data-jpa/pull/237

但是，老练的读者可能已经感觉到有些不对劲。确实，此更改不会破坏任何内容，仅在没有子实体的简单情况下才起作用：

 @Entity public class BankAccount { @Id Long id; @Column BigDecimal rate = BigDecimal.ZERO; }

现在假设其所有者已绑定到该帐户：

 @Entity public class BankAccount { @Id Long id; @Column BigDecimal rate = BigDecimal.ZERO; @ManyToOne @JoinColumn(name = "user_id") User user; }

有一种方法可以让您断开用户与帐户的连接，并将其转移到新用户：

 @Transactional public BankAccount changeUser(Long id, User newUser) { BankAccount account = repo.findById(id).orElseThrow(NPE::new); account.setUser(newUser); return repo.save(account); }

现在会发生什么？检查em.contains(entity)将返回true，这意味着将不会调用em.merge(entity) 。如果User实体键是根据序列创建的（最常见的情况之一），则只有在交易完成（或手动调用Session::flush ）后才会创建，即用户将处于DETACHED状态，并且其父实体（帐户）-处于PERSISTENT状态。在某些情况下，这可能会破坏应用程序的逻辑，这就是发生的情况：

02/03/2018 DATAJPA-931中断与RepositoryItemWriter的合并

在这方面，创建了为CrudRepository :: save中现有实体进行的还原优化任务，并进行了更改：还原DATAJPA-931 。

Blind CrudRepository :: findById

我们继续考虑相同的数据模型：

 @Entity public class User { @Id Long id; // ... } @Entity public class BankAccount { @Id Long id; @ManyToOne @JoinColumn(name = "user_id") User user; }

该应用程序具有一种为指定用户创建新帐户的方法：

 @Transactional public BankAccount newForUser(Long userId) { BankAccount account = new BankAccount(); userRepository.findById(userId).ifPresent(account::setUser); //<---- return accountRepository.save(account); }

对于版本2，*箭头所示的反模式并不那么醒目-在较旧的版本中更清楚地看到：

 @Transactional public BankAccount newForUser(Long userId) { BankAccount account = new BankAccount(); account.setUser(userRepository.findOne(userId)); //<---- return accountRepository.save(account); }

如果您看不到“目测”缺陷，请查看以下查询：

 select u.id, u.name from user u where u.id = ? call next value for hibernate_sequence insert into bank_account (id, /*…*/ user_id) values (/*…*/)

我们通过键获取用户的第一个请求。接下来，我们从数据库中获取新生儿帐户的密钥，并将其插入表中。我们从用户那里获得的唯一东西就是密钥，它已经作为方法参数存在。另一方面， BankAccount包含“用户”字段，我们不能将其保留为空（由于体面的人，我们在方案中设置了限制）。经验丰富的开发人员可能已经发现了一种方法 ~~吃鱼和骑马~~ 让用户和请求都不要：

 @Transactional public BankAccount newForUser(Long userId) { BankAccount account = new BankAccount(); account.setUser(userRepository.getOne(userId)); //<---- return accountRepository.save(account); }

JpaRepository::getOne返回与活动“实体”具有相同类型的键的包装器。此代码仅给出两个请求：

 call next value for hibernate_sequence insert into bank_account (id, /*…*/ user_id) values (/*…*/)

当要创建的实体包含具有多对一/一对一关系的许多字段时，此技术将有助于加快保存速度并减少数据库的负载。

执行HQL查询

这是一个单独且有趣的主题：）。域模型是相同的，并且有这样的请求：

 @Query("select count(ba) " + " from BankAccount ba " + " join ba.user user " + " where user.id = :id") long countUserAccounts(@Param("id") Long id);

考虑“纯” HQL：

 select count(ba) from BankAccount ba join ba.user user where user.id = :id

执行后，将创建此SQL查询：

 select count(ba.id) from bank_account ba inner join user u on ba.user_id = u.id where u.id = ?

即使通过明智的生活和对SQL开发人员的充分了解，这里的问题也不会立即显现出来：通过用户键进行inner join联接将从选择中排除缺少user_id帐户（并且以很好的方式，应该在模式级别禁止插入这些帐户），这意味着通常不希望加入user表需要。该请求可以简化（并加速）：

 select count(ba.id) from bank_account ba where ba.user_id = ?

有一种方法可以使用HQL在c中轻松实现此行为：

 @Query("select count(ba) " + " from BankAccount ba " + " where ba.user.id = :id") long countUserAccounts(@Param("id") Long id);

此方法创建“精简”请求。

查询与方法摘要

Spring Data的主要功能之一是能够从方法名称创建查询，这非常方便，尤其是与IntelliJ IDEA的智能插件结合使用时。上一个示例中描述的查询可以很容易地重写：

 // @Query("select count(ba) " + " from BankAccount ba " + " where ba.user.id = :id") long countUserAccounts(@Param("id") Long id); // long countByUserAccount_Id(Long id);

它似乎更简单，更短，更易读，而且最重要的是-您无需查看请求本身。我读了方法的名称-现在已经清楚了选择的方法和方式。但是细节在这里。我们已经看到了用@Query标记的方法的最终查询。在第二种情况下会发生什么？

b！

 select count(ba.id) from bank_account ba left outer join // <--- !!!!!!! user u on ba.user_id = u.id where u.id = ?

“什么鬼！！” -开发人员会惊呼。毕竟，我们已经看到了 ~~小提琴家~~ 不需要join 。

原因是平淡无奇的：

如果您尚未升级到补丁程序版本，并且现在加入表会减慢请求的速度，那么请不要失望：有两种方法可以减轻痛苦：

一个好方法是添加optional = false （如果电路允许）：

 @Entity public class BankAccount { @Id Long id; @ManyToOne @JoinColumn(name = "user_id", optional = false) User user; }

关键的方法是添加与User实体键相同类型的列，并在查询而不是user字段中使用它：

 @Entity public class BankAccount { @Id Long id; @ManyToOne @JoinColumn(name = "user_id") User user; @Column(name = "user_id", insertable = false, updatable = false) Long userId; }

现在，来自方法的请求会更好：

 long countByUserId(Long id);

给

 select count(ba.id) from bank_account ba where ba.user_id = ?

我们取得了什么成就。

采样极限

出于我们的目的，我们需要限制选择（例如，我们想从*RepositoryCustom方法返回Optional ）：

 select ba.* from bank_account ba order by ba.rate limit ?

现在的Java：

 @Override public Optional<BankAccount> findWithHighestRate() { String query = "select b from BankAccount b order by b.rate"; BankAccount account = em .createQuery(query, BankAccount.class) .setFirstResult(0) .setMaxResults(1) .getSingleResult(); return Optional.ofNullable(bankAccount); }

指定的代码具有一个令人不快的功能：如果请求返回一个空选择，则将引发异常

 Caused by: javax.persistence.NoResultException: No entity found for query

在我看到的项目中，可以通过两种主要方法解决此问题：

尝试捕获具有从直截了当Optonal.empty()异常并返回Optonal.empty()到更高级的方式的变体，例如将带有请求的lambda传递给实用程序方法
仓库方法包装返回Optional

而且很少见到正确的解决方案：

 @Override public Optional<BankAccount> findWithHighestRate() { String query = "select b from BankAccount b order by b.rate"; return em.unwrap(Session.class) .createQuery(query, BankAccount.class) .setFirstResult(0) .setMaxResults(1) .uniqueResultOptional(); }

EntityManager是JPA标准的一部分，而Session属于Hibernate，并且是恕我直言，IMHO是一种更高级的工具，通常被遗忘。

[有时]有害的改善

当您需要从“厚”实体中获取一个小字段时，我们可以这样做：

 @Query("select a.available from BankAccount a where a.id = :id") boolean findIfAvailable(@Param("id") Long id);

该请求使您可以获取boolean类型的一个字段而无需加载整个实体（添加了一级缓存，在会话结束时检查更改以及其他费用）。有时，这不仅不能提高性能，反之亦然-它会从头开始创建不必要的查询。想象一下执行一些检查的代码：

 @Override @Transactional public boolean checkAccount(Long id) { BankAccount acc = repository.findById(id).orElseThow(NPE::new); // ... return repository.findIfAvailable(id); }

这段代码至少发出了2个请求，尽管可以避免第二个请求：

 @Override @Transactional public boolean checkAccount(Long id) { BankAccount acc = repository.findById(id).orElseThow(NPE::new); // ... return repository.findById(id) //    .map(BankAccount::isAvailable) .orElseThrow(IllegalStateException::new); }

结论很简单：在一个事务的框架内，不要忽略第一级的缓存，只有第一个JpaRepository::findById引用数据库， JpaRepository::findById第一级JpaRepository::findById缓存始终处于打开状态并与会话相关联，而会话通常与当前事务相关联。

进行测试（在本文开头提供了到存储库的链接）：

窄接口测试： InterfaceNarrowingTest
使用组合键测试示例： EntityWithCompositeKeyRepositoryTest
测试多余的CrudRepository::save ： ModifierTest.java
盲测CrudRepository::findById ： ChildServiceImplTest
不必要的left join BankAccountControlRepositoryTest测试： BankAccountControlRepositoryTest

可以使用RedundantSaveBenchmark计算额外调用CrudRepository::save的成本。它是使用BenchmarkRunner类启动的。

Spring Data JPA：好的和坏的是什么