1

My current rule is

RewriteRule ^data/(v[0-9]\.[0-9]\.?[0-9]?)/.*$ http://35.231.131.100:5000/cocoon_$1?subject=https://w3id.org/cocoon/$0 [L,NE,QSA,R=308] 

It will convert

https://w3id.org/cocoon/data/v1.0.1/2019-03-07/CloudStorageTransactionsPriceSpecification/Azure/managed_disk/transactions-ssd

to

http://35.231.131.100:5000/cocoon_v1.0.1?subject=https://w3id.org/cocoon/data/v1.0.1/2019-03-07/CloudStorageTransactionsPriceSpecification/Azure/managed_disk/transactions-ssd

But for another example original URL

https://w3id.org/cocoon/data/v1.0.1/Measurement/DownlinkSpeed-1-128-KB/StorageService/Gcloud/150.203.213.249/lat=-35.271475/long=149.121434/2019-02-26T07%3A14%3A19.932Z/australia-southeast1

I need to encode the query string for subject=, i.e.

http://35.231.131.100:5000/cocoon_v1.0.1?subject=https%3A%2F%2Fw3id.org%2Fcocoon%2Fdata%2Fv1.0.1%2FMeasurement%2FDownlinkSpeed-1-128-KB%2FStorageService%2FGcloud%2F150.203.213.249%2Flat%3D-35.271475%2Flong%3D149.121434%2F2019-02-26T07%253A14%253A19.932Z%2Faustralia-southeast1

I'm currently using the NE flag, for not escaping $1, i.e. v1.0.1.

How do I encode the https://w3id.org/cocoon/$0 part?

Some reasons behind all this: It is the : in the date time part of the URL stopped the page from working, encoding it individually to %3A doesn't work, so I'm encoding the whole subject= part.


Edit

Rules suggested by MrWhite, doesn't quite work.

RewriteCond %{THE_REQUEST} [a-z]{3,5}\s.*?/(data/(v[0-9]\.[0-9]\.?[0-9]?)/.*)\s [NC] RewriteRule ^data/(v[0-9]\.[0-9]\.?[0-9]?)/.* http://35.231.131.100:5000/cocoon_$1?subject=https\%3A\%2F\%2Fw3id.org\%2Fcocoon\%2F%1 [L,NE,QSA,R=308] 

I tested with

curl http://localhost/cocoon/data/v1.0.1/Measurement/DownlinkSpeed-1-128Gcloud/150.203.213.249/lat=-35.271475/long=149.121434/2019-02-26T07%3A14%3A19.932Z/australia-southeast1 

It redirects to http://35.231.131.100:5000/cocoon_v1.0.1?subject=https%3A%2F%2Fw3id.org%2Fcocoon%2Fdata/v1.0.1/Measurement/DownlinkSpeed-1-128-KB/StorageService/Gcloud/150.203.213.249/lat=-35.271475/long=149.121434/2019-02-26T07%3A14%3A19.932Z/australia-southeast1

This can't be recognized by my Linked Data Fragments server. The / isn't encoded. I think the subject doesn't take a partial encoded string. With : it has to be encoded, hence the whole subject string has to go with the encoding option.

And for B flag, I tested with B=/, it seems everthing get encoded twice? i.e. . to %252e and / to %252f?

And thank you for pointing out the unintentional trailing dot, I actually want v[0-9]\.[0-9](?:\.[0-9])?

I also tried the N flag, but couldn't get it right. It becames an infinite loop.

RewriteRule ^data/(v[0-9]\.[0-9]\.?[0-9]?)/([^/]+)/(.*) data/$1/$2\%2F$3 [N=20] RewriteRule ^data/(v[0-9]\.[0-9]\.?[0-9]?)/.* http://35.231.131.100:5000/cocoon_$1?subject=https\%3A\%2F\%2Fw3id.org\%2Fcocoon\%2Fdata\%2F$1\%2F$3[L,NE,QSA,R=308] 

I wanted [^/]+ to match anything not /, so I can replace all slash after version number to be the encoded value, added \ to escape the %2F.

1 Answer 1

1

You could use the B flag to escape the backreferences. However, by default that will also escape the dots in v1.0.1 in the $1 backreference, unless you explicitly state the characters that should be escaped in the B flag itself, eg. B=: (requires Apache 2.4.26+).

Alternatively, if the actual problem "is the : in the date time part of the URL" and this is already correctly encoded in the requested URL (as it appears to be in your example) then you can get the already encoded URL part from THE_REQUEST server variable instead of the URL-path as matched by the RewriteRule pattern. The "problem" with getting the URL parts using the RewriteRule pattern is that this has already been URL decoded (hence the reason to use the B flag as mentioned above).

You can manually encode the first (constant) part of the query string (ie. https://w3id.org/cocoon/ as https%3A%2F%2Fw3id.org%2Fcocoon%2F) if you wish this to be encoded.

Try the following instead:

RewriteCond %{THE_REQUEST} [a-z]{3,5}\s.*?/(data/(v[0-9]\.[0-9]\.?[0-9]?)/.*)\s [NC] RewriteRule ^data/(v[0-9]\.[0-9]\.?[0-9]?)/.* http://35.231.131.100:5000/cocoon_$1?subject=https\%3A\%2F\%2Fw3id.org\%2Fcocoon\%2F%1 [L,NE,QSA,R=308] 

Additional notes:

  • The literal % for the %-encoded characters in the substitution string are backslash escaped so as not to be seen as (invalid) backreferences to the preceding CondPattern (otherwise they will be seen as backreferences to nothing).

  • Make sure you've cleared your browser cache before testing and test with a temporary (302 or 307) redirect before changing this to permanent.


Aside: Your regex that grabs the version number allows a trailing dot after the second ("minor") number, eg. v1.0. - is that intentional?

1
  • Thank you for your suggestion! Please see my edit in question for problems with this solution. Also good eye for spotting out the trailing dot, I actually want v[0-9]\.[0-9](\.[0-9])? Commented Apr 5, 2019 at 2:57

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.