인덱스 적용하여 검색 성능 높이기

욜로가 서비스에는 다양한 목록 조회 API 들을 제공한다. 목록을 조회하는 기능은 초반에는 문제가 없을 수 있으나, 데이터의 양이 늘어나면 늘어날 수록 검색 속도가 느려진다. 이를 예방하기 위해 욜로가 서비스에서 제공하는 모든 목록 API 들을 조사하고 필요한 곳에 인덱스를 걸어서 검색 속도를 개선하고자 한다. 데이터가 많아질 때에 인덱스가 제대로 동작하는지 알기 위해서 러닝크루 데이터 10만개를 미리 생성해두었다.

욜로가 서비스에서 제공하는 목록 조회 API 는 다음과 같이 총 5개이다. 각 API 간의 평균 응답 시간을 측정하기 위해서 JMeter 를 이용하였다. 가상 사용자 10명이 1초당 한번씩 총 10번을 요청하는 환경에서 5분동안 응답하는 시간의 평균 시간이다. (내 주변 러닝크루 목록 조회 API 가 성능이 너무 안좋아서 10명으로 셋팅해두었다…)

	평균 응답 시간	실행되는 쿼리문
내 주변 러닝크루 목록 조회	1769ms	SELECT *
FROM running_crew
WHERE ST_Distance_Sphere(POINT(?, ?), departure) <= ?
AND archived = true limit ? ;
현재 진행중인 내 러닝크루 목록 조회	459ms	SELECT rc.*
FROM running_crew AS rc
JOIN participant AS p
ON rc.id = p.running_crew_id
WHERE rc.status = ?
AND rc.archived = true
AND p.member_id = ?
AND p.status = ?
AND p.archived = true limit ? ;
내가 주최한 러닝크루 목록 조회	421ms	select r1_0.*
from running_crew r1_0
where r1_0.member_id=?
and r1_0.status=?
and r1_0.archived limit ?,?
내가 참여한 러닝크루 목록 조회	461ms	SELECT rc.*
FROM running_crew AS rc
JOIN partic pant AS p
ON rc.id = p.running_crew_id
WHERE p.member_id = ?
AND p.status = ?
AND p.archived = true
AND rc.status = ?
AND rc.archived = true limit ? ;
러닝크루 참여자 목록 조회	494ms	SELECT *
FROM participant AS p
LEFT JOIN member AS m
ON p.member_id = m.id
WHERE p.running_crew_id = ?
AND p.status = ?
AND p.archived = true limit ? ;

그래프

평균 응답 시간을 기준으로 보면 ‘내 주변 러닝크루 목록 조회’ 가 가장 성능이 나쁘다. 하지만 실제로 실행되는 쿼리문을 보면 이는 MySQL 에서 제공하는 함수를 사용한다. 이번에 설정하려는 인덱스와는 다르므로 제외하였다.

그 외의 API 들을 보면 조건절에 중복되는 항목들이 많다. 카디널리티가 낮은 항목을 제외하고 크게 러닝크루의 상태값, 참여자의 상태값, 각 아이디값 등이 조건절에 중복된다. 일단 이를 염두해두고 실제로 쿼리문이 어떻게 실행되는지 실행 계획을 분석해보았다.

참여자 테이블 인덱스 생성

일단 RunningCrew 의 하위 테이블인 Participant 부터 고민해보자. Participant 가 쿼리문에 사용되는 상황은 RunningCrew 와 JOIN 하는 경우, Member 와 JOIN 하는 경우이다. 가장 단순하게 인덱스를 걸기 위해 JOIN 전에 테이블의 row 수를 줄여보자. Participant 에서 모든 쿼리문에 중복적으로 사용되는 조건은 running_crew_id, member_id, status 이다. 세 개의 조건 모두 Participant 를 조회할 때 필수값이다. 하지만 status 는 비즈니스에 따라서 제외될 가능성이 있는 컬럼이기 때문에 인덱스의 가장 마지막에 배치한다.

그러면 러닝크루 아이디와 참여자 아이디 중 어느 것을 앞에 배치하는 것이 좋을까? 이는 데이터의 분포도에 따라 다를 거라고 생각한다. 따라서 각 상황에 맞게 사용할 수 있도록 (running_crew_id, member_id, status) 와 (member_id, running_crew_id, status) 둘 다 인덱스를 걸어주었다.

현재 진행중인 내 러닝크루 목록 조회 (인덱스 설정 전) - 74.1ms

-> Limit: 10 row(s)  (cost=10135.64 rows=5) (actual time=120.037..120.037 rows=0 loops=1)
    -> Nested loop inner join  (cost=10135.64 rows=5) (actual time=120.036..120.036 rows=0 loops=1)
        -> Filter: ((p.archived = true) and (p.`status` = 'PARTICIPATING') and (p.member_id = 93))  (cost=10100.65 rows=100) (actual time=0.058..119.317 rows=1 loops=1)
            -> Table scan on p  (cost=10100.65 rows=99964) (actual time=0.050..62.179 rows=100151 loops=1)
        -> Filter: ((rc.archived = true) and (rc.`status` = 'IN_PROGRESS'))  (cost=0.25 rows=0) (actual time=0.021..0.021 rows=0 loops=1)
            -> Single-row index lookup on rc using PRIMARY (id=p.running_crew_id)  (cost=0.25 rows=1) (actual time=0.020..0.020 rows=1 loops=1)

현재 진행중인 내 러닝크루 목록 조회 (인덱스 설정 후) - 9.8ms

-> Limit: 10 row(s)  (cost=0.27 rows=0) (actual time=0.046..0.046 rows=0 loops=1)
    -> Nested loop inner join  (cost=0.27 rows=0) (actual time=0.045..0.045 rows=0 loops=1)
        -> Filter: (p.archived = true)  (cost=0.26 rows=0) (actual time=0.028..0.029 rows=1 loops=1)
            -> Index lookup on p using participant_member_id_running_crew_id_status_idx (member_id=93), with index condition: (p.`status` = 'PARTICIPATING')  (cost=0.26 rows=1) (actual time=0.027..0.028 rows=1 loops=1)
        -> Filter: ((rc.archived = true) and (rc.`status` = 'IN_PROGRESS'))  (cost=0.35 rows=0) (actual time=0.014..0.014 rows=0 loops=1)
            -> Single-row index lookup on rc using PRIMARY (id=p.running_crew_id)  (cost=0.35 rows=1) (actual time=0.013..0.013 rows=1 loops=1)

내가 참여한 러닝크루 목록 조회 (인덱스 설정 전) - 99.4ms

-> Limit: 10 row(s)  (cost=10135.64 rows=5) (actual time=88.257..88.257 rows=0 loops=1)
    -> Nested loop inner join  (cost=10135.64 rows=5) (actual time=88.255..88.255 rows=0 loops=1)
        -> Filter: ((p.archived = true) and (p.`status` = 'PARTICIPATING') and (p.member_id = 93))  (cost=10100.65 rows=100) (actual time=0.050..88.237 rows=1 loops=1)
            -> Table scan on p  (cost=10100.65 rows=99964) (actual time=0.045..50.211 rows=100151 loops=1)
        -> Filter: ((rc.archived = true) and (rc.`status` = 'IN_PROGRESS'))  (cost=0.25 rows=0) (actual time=0.018..0.018 rows=0 loops=1)
            -> Single-row index lookup on rc using PRIMARY (id=p.running_crew_id)  (cost=0.25 rows=1) (actual time=0.016..0.016 rows=1 loops=1)

내가 참여한 러닝크루 목록 조회 (인덱스 설정 후) - 10.2ms

-> Limit: 10 row(s)  (cost=0.27 rows=0) (actual time=0.042..0.042 rows=0 loops=1)
    -> Nested loop inner join  (cost=0.27 rows=0) (actual time=0.041..0.041 rows=0 loops=1)
        -> Filter: (p.archived = true)  (cost=0.26 rows=0) (actual time=0.025..0.027 rows=1 loops=1)
            -> Index lookup on p using participant_member_id_running_crew_id_status_idx (member_id=93), with index condition: (p.`status` = 'PARTICIPATING')  (cost=0.26 rows=1) (actual time=0.024..0.025 rows=1 loops=1)
        -> Filter: ((rc.archived = true) and (rc.`status` = 'IN_PROGRESS'))  (cost=0.35 rows=0) (actual time=0.013..0.013 rows=0 loops=1)
            -> Single-row index lookup on rc using PRIMARY (id=p.running_crew_id)  (cost=0.35 rows=1) (actual time=0.012..0.012 rows=1 loops=1)

러닝크루 참여자 목록 조회 (인덱스 설정 전) - 17.0ms

-> Limit: 10 row(s)  (cost=10135.64 rows=10) (actual time=0.052..0.070 rows=10 loops=1)
    -> Nested loop left join  (cost=10135.64 rows=100) (actual time=0.051..0.069 rows=10 loops=1)
        -> Filter: ((p.archived = true) and (p.`status` = 'PARTICIPATING') and (p.running_crew_id = 270))  (cost=10100.65 rows=100) (actual time=0.043..0.048 rows=10 loops=1)
            -> Table scan on p  (cost=10100.65 rows=99964) (actual time=0.039..0.041 rows=10 loops=1)
        -> Single-row index lookup on m using PRIMARY (id=p.member_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=10)

러닝크루 참여자 목록 조회 (인덱스 설정 후) - 10.5ms