Using regex in redshift to find dollar values

I have a field in a Redshift table that has user-generated text. The field is where users can say how much they think something costs.

Ideally it'd just be a decimal, but it's varchar. So users can type "I think this is worth \$25", or "I'd pay 55" or "\$117".

So I'm trying to use regexp_substr to pull this out. Specifically regexp_substr(f.comment_text, '\\$?[0-9]*'). But this doesn't work on a subset of entries for some reasons (eg Could do for $115).

If I remove the ? it works on that, but no longer on entries that don't use $. Why? And what should I use instead?

Topic redshift regex

Category Data Science


I am getting that you want extract numbers with either '$' or even not without a dollar.

Why are not using just [0-9]+ for that field? The results will be like

    some $1 something >> ['1']
    some other 115 >> ['115']
    $115 some thing >> ['115']

If you want to include '\$' sign, then '\$' can be repeated zero or once or by typo more, then regex would be

    \$*[0-9]+

which outputs

    some $1 something >> ['$1']
    some other 115 >> ['115']
    $115 some thing >> ['$115']

If this is not what you want, comment.

Hope this helps.

I would recommend using online regex matchers like here. They have short and sweet references which really help us get to the context.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.