Overdue evolution in SQL
Despite my efforts I cannot think of a way to answer my need :
I have 2 tables containing respectively a set of loans to be reimbursed, and a set of reimbursements on these loans (not all loans have an entry in reimbursement table because some of them remain unpaid).
Table loan has columns:
- id
 - amount
 - due_date
 
Table reimbursement has columns:
- id
 - debt_id
 - payment_date
 
My goal is to obtain the amount that was overdue for more than 5 days for each date in the past.
Here is where I am at (in Postgres grammar):
with due_for_5_days_by_date AS (
    select 
    due_date + interval '5' DAY as due_date,
    sum(amount) as amount
    FROM loan
    GROUP BY due_date
),
paid_by_date as (
    select 
        payment_date,
        sum(amount) as amount
    FROM reimbursement
    join loan on loan.id = loan_id
    GROUP BY payment_date
),
cumulated as (
    SELECT
        COALESCE(due_date, payment_date) as date,
        SUM(COALESCE(paid_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as paid,
        SUM(COALESCE(due_for_5_days_by_date.amount, 0)) over (order by COALESCE(due_date, payment_date)) as due_5d
    FROM due_for_5_days_by_date
    FULL OUTER JOIN paid_by_date on due_date = payment_date
)
select 
    DATE,
    due_5d - paid as overdue_5d
from cumulated
order by date
Explanation :
- First I get the amount due for exactly 5 days for each date
 - Then, I get the amount paid for each date
 - Then I sum the amounts obtained at 1. and 2. to get the cumulative due and paid amounts
 - Then by difference I get the overdue amount
 
But the problem is that, by doing this, the amount paid by date also contains reimbursement of loans expired for less than 5 days. In other words the paid amount for each date in the final query is overestimated as it should exclude the amount received on loans recently expired on that date. I cannot think of a way to fix this.
Can you think of a way to obtain what I want ?
Thanks a lot for your help !
Topic google-bigquery sql
Category Data Science