Page MenuHome GnuPG

trust signature domain restrictions don't work
Closed, ResolvedPublic

Description

OSes tested: Ubuntu 16.04, Centos7

When issuing a trust signature for, say "customer.com", gnupg saves the
trust_regexp to '<[^>]+[@.]customer\.com>$', which looks right. However, when
trust checks are done, that string is passed to sanitize_regexp(), which adds
backslashes before every letter. The resulting regexp doesn't match email
addresses in the domain.

If I modify sanitize_regexp() to leave alphabet letters alone, it matches and
the trust sig works as expected.

I think this is related to bug T2284.

(I have example output and an example patch, but adding these makes the bug
tracker think I'm spam; I'll try to add again later)

Details

Version
1.4.20, 2.0.22, 2.1.11

Event Timeline

linsam set Version to 1.4.20, 2.0.22, 2.1.11.
linsam added a subscriber: linsam.

Attached example are the following setup:

user1 tsign user2 with full trust, depth 1, domain="customer.com". User2 signs
user3 through user5 (regular signatures). User4 is at customer.com, users 3 and
5 are at example.com.

Attached example patch prevents escaping normal lowercase letters.

Note that this isn't a general solution, though it does solve the issue for me.
For example, some email addresses have numbers (I don't know if having backslash
before numbers is an issue like it is for letters)

Attached example output after patch is applied. Now User4 has full validity like
expected, and the debug output shows a match for User4's email address (NOTE:
the debug output has 'YES' for no match and 'NO' for successful match)

For information, this issue was also discussed on both gnupg-user and gnupg-devel back in january 2017. I mention it here for reference.

In particular:

We need to find out when this regression happened. Back when David implemented trust signatures he ran tests with the PGP folks to make sure our implementation are compatible. This is why I call this a regression.

The cause of the regression may actually not be in GnuPG's code.

As it is, the sanitize_regexp function obviously assumes that it is OK to escape characters that do not have any special meaning, and it escapes everything. But according to POSIX, the behavior in that case is left to the implementations:

The interpretation of an ordinary character preceded by an unescaped <backslash> is undefined.

So, maybe the behavior of glibc's regex implementation on escaped ordinary characters changed at some point between the time sanitize_regexp was written and now?

In any case, I think we should escape only the characters that must be escaped according to POSIX, so that we do not depend on the behavior of a particular regex engine.

dkg added a subscriber: dkg.

including these tests (or something similar) in the gpg test suite would be a good way to avoid future regressions.

dkg renamed this task from tust signature domain restrictions don't work to trust signature domain restrictions don't work.Jul 14 2017, 1:01 PM
In T2923#100519, @dkg wrote:

including these tests (or something similar) in the gpg test suite would be a good way to avoid future regressions.

Agreed, but difficult. So far we have no tests for the trust model pgp.

As it is, the sanitize_regexp function obviously assumes that it is OK to escape characters that do not have any special meaning, and it escapes everything.

Maybe that is because rfc4880 is vague at best about the syntax of regular expressions. https://tools.ietf.org/html/rfc4880#section-8 says:

An atom is a [...] '\' followed by a single character (matching that character) [...]

The way I read that allows what sanitize_regexp is doing. However, maybe we cannot use glibc's regex implementation then.

For the reference sanitize_regexp was introduced in this commit from 2007 to "Protect against malloc bombs.": and I see no changes to it (except typo correction) in git blame in trustdb.c.

It might be not a regression. The possibilities are: (1) it was tested by using non-GNU operating system. (2) Tests didn't cover characters (b, B, w, W, s, and S).

GNU extension for regular expression supports: \b, \B (word boundary), \w, \W (word), \s, \S (space). This extension is enabled by GNU C library for regcomp.
Apparently, sanitize_regexp doesn't work well with the GNU extension for regexp.

For what is worth I think sanitize_regexp was programmed while reading 4880 because the RFC allows backslash + any character (section 8: Regular Expressions):

An atom is
(...)
a '\' followed by a single character (matching that character)

Henry Spencer wrote three implementations (old, BSD, and Tcl): https://garyhouston.github.io/regex/
Indeed, for the one in old library and BSD library, \ + CHAR means that single CHAR.
For one in Tcl library, \s, \S, \w, \W is supported (just like GNU), and \d, \D (digit) is also supported.

I think that the sanitize_regexp refers the one in old library which doesn't support "{}" operator.

I'm going to apply @gouttegd 's patch. Since the commit message is wrong (example.com just works well, while example.es doesn't), I will write my own comment.
I can't adopt John O'Meara's approach, because it's not a bug fix, but adding feature, assuming GNU regexp use somehow.

No, I was not accurate. EXAMPLE.COM works, while example.com doesn't work.

Anyway, I committed a change to master. rGccf3ba92087e: g10: Fix regexp sanitization.

It might be easier to include a regexp implementation in GnUPG proper. This way we have a well defined behaviour and it will work also on Windows. The gpg-check-pattern tool might slightly change its behaviour, though.

It's fixed in master.
It is good to backport this to GnuPG 2.2 and GnuPG 1.4.

It's in 2.2.4 and 1.4.23.
Closing.