24

Given ANY GitHub repository url string like:

git://github.com/some-user/my-repo.git 

or

[email protected]:some-user/my-repo.git 

or

https://github.com/some-user/my-repo.git 

What is the best way in bash to extract the repository name my-repo from any of the following strings? The solution MUST work for all types of urls specified above.

Thanks.

10 Answers 10

26
$ url=git://github.com/some-user/my-repo.git $ basename=$(basename $url) $ echo $basename my-repo.git $ filename=${basename%.*} $ echo $filename my-repo $ extension=${basename##*.} $ echo $extension git 
3
  • Thanks, trying to make it a one liner, but not working. REPO_NAME=${`basename $REPO_URL`%.*} Commented Aug 14, 2012 at 4:17
  • 2
    echo $(basename "$url" ".${url##*.}"). Commented Aug 14, 2012 at 6:31
  • +1. Is there anything similar to get the hostname, i.e. github.com, instead, @quanta? Commented Mar 31, 2015 at 2:19
27

I'd go with basename $URL .git.

1
  • 2
    the best answer. The shortest as well Commented Apr 22, 2020 at 9:23
15

Old post, but I faced the same problem recently.

The regex ^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$ works for the three types of URL.

#!/bin/bash # url="git://github.com/some-user/my-repo.git" # url="https://github.com/some-user/my-repo.git" url="[email protected]:some-user/my-repo.git" re="^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+)(.git)*$" if [[ $url =~ $re ]]; then protocol=${BASH_REMATCH[1]} separator=${BASH_REMATCH[2]} hostname=${BASH_REMATCH[3]} user=${BASH_REMATCH[4]} repo=${BASH_REMATCH[5]} fi 

Explaination (see it in action on regex101):

  • ^ matches the start of a string
  • (https|git) matches and captures the characters https or git
  • (:\/\/|@) matches and captures the characters :// or @
  • ([^\/:]+) matches and captures one character or more that is not / nor :
  • [\/:] matches one character that is / or :
  • ([^\/:]+) matches and captures one character or more that is not / nor :, yet again
  • [\/:] matches the character /
  • (.+) matches and captures one character or more
  • (.git)* matches optional .git suffix at the end
  • $ matches the end of a string

This if far from perfect, as something like [email protected]:some-user/my-repo.git would match, but I think it's fine enough for extraction.

5
  • 🏆this is gold! Commented Jul 1, 2018 at 14:24
  • 2
    some urls don't have .git at the end. Commented Jan 2, 2019 at 14:58
  • @kenn: then they'd not be a valid remote for git, however. See git-scm.com/docs/git-push#URLS. Commented Feb 9, 2022 at 12:06
  • 2
    I'm using an expanded version (play with it on regex101: ^((https?|ssh|git|ftps?):\/\/)?(([^\/@]+)@)?([^\/:]+)[\/:]([^\/:]+)\/(.+).git\/?$, which better matches the official spec for URLs. Group 2 is the scheme, if missing the default is ssh. Commented Feb 9, 2022 at 12:13
  • @MartijnPieters nice, thanks. I added variables to capture group contents and made .git suffix optional: ^((?<protocol>https?|ssh|git|ftps?):\/\/)?((?<user>[^\/@]+)@)?(?<hostname>[^\/:]+)[\/:](?<pathHead>[^\/:]+)\/(?<pathTail>(.+)(.git)?\/?)$, testable here: regex101.com/r/NtqNET/1 Commented Oct 14, 2023 at 9:53
6

Summing up:

  • Get url without (optional) suffix:

    url_without_suffix="${url%.*}" 
  • Get repository name:

    reponame="$(basename "${url_without_suffix}")" 
  • Get user (host) name afterwards:

    hostname="$(basename "${url_without_suffix%/${reponame}}")" 
1

use regular expression: /([^/]+)\.git$/

0
basename $git_repo_url | tr -d ".git" 
0

basename is my favorite, but you can also use sed:

url=git://github.com/some-user/my-repo.git reponame="$(echo $url | sed -r 's/.+\/([^.]+)(\.git)?/\1/')" # reponame = "my-repo" 

"sed" will delete all text until the last / + the .git extension (if exists), and will retain the match of group \1 which is everything except dot ([^.]+)

0

Using Hitcham's awesome answer above allowed me to come up with this, using sed to output exactly what needed: org/reponame with sed.

output = echo ${git_url} | sed -nr 's/^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+).git$$/\4\/\3/p'` 

Works well in ubuntu, doesn't work for the sed available by default on macosx.

0

A slight modification to @Hicham's answer

^(https|git)(:\/\/|@)([^\/:]+)[\/:]([^\/:]+)\/(.+?)(\.git)?$

Will extract out the .git suffix as well.

0

After fiddling half a day in regex101 and using input from @womble and the others.. I came up with this, which also has the capture names to denote what is handled where.. It may help even me in the neer future :P

/^((?<protocol>https?|ssh|git|ftps?):\/\/)?((?<user>[^\/@]+)@)?(?<host>[^\/:]+)[\/:](?<port>[^\/:]+)\/(?<path>.+\/)?(?<repo>.+?)(?<suffix>\.git[\/]?)?$/ 

it basically allows to use the repo name (see part ?) in a

.../reponame.git, .../reponame.git/, .../reponame and .../reponame/

repo url, as it handles the optional .git

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.