others - curl命令如何对数据进行urlencode?

我试图编写一个bash脚本,并将它的测试接受参数通过curl发送到网站,要确保正确处理特殊字符,我需要对该值进行url编码,最好的方法是什么?

以下是我目前的基本脚本:


#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d"param=${value}" http://${host}/somepath $@

时间:

或者使用curl --data-urlencode

这个post数据,类似于其他--data选项,但是它执行URL-encoding 。

使用方法的示例:


curl --data-urlencode"paramName=param" www.example.com

更多信息:man curl

在bash脚本的第二行中使用Perl模块,uri_escape函数的URI::Escape :


...

value="$(perl -MURI::Escape -e 'print uri_escape($ARGV[0]);'"$2")"
...

这是纯BASH答案。


rawurlencode() {
 local string="${1}"
 local strlen=${#string}
 local encoded=""

 for (( pos=0 ; pos<strlen ; pos++ )); do
 c=${string:$pos:1}
 case"$c" in
 [-_.~a-zA-Z0-9] ) o="${c}" ;;
 * ) printf -v o '%%%02x'"'$c"
 esac
 encoded+="${o}"
 done
 echo"${encoded}" # You can either set a return variable (FASTER) 
 REPLY="${encoded}" #+or echo the result (EASIER)... or both... :p
}

你可以通过两种方式使用它:


easier: echo http://url/q?=$( rawurlencode"$args" )
faster: rawurlencode"$args"; echo http://url/q?${REPLY}

下面是匹配rawurldecode()的函数,


# Returns a string in which the sequences with percent (%) signs followed by
# two hex digits have been replaced with literal characters.
rawurldecode() {

 # This is perhaps a risky gambit, but since all escape characters must be
 # encoded, we can replace %NN with xNN and pass the lot to printf -b, which
 # will decode hex for us

 printf -v REPLY '%b'"${1//%/x}" # You can either set a return variable (FASTER)

 echo"${REPLY}" #+or echo the result (EASIER)... or both... :p
}

使用匹配集,我们现在可以执行一些简单的测试:


$ diff rawurlencode.inc.sh 
 <( rawurldecode"$( rawurlencode"$( cat rawurlencode.inc.sh )" )" ) 
 && echo Matched

Output: Matched

如果你需要外部工具(运行得更快,并且生成二进制文件),我在OpenWRT路由器上发现了这个。


replace_value=$(echo $replace_value | sed -f /usr/lib/ddns/url_escape.sed)

其中url_escape.sed是包含以下规则的文件:


# sed url escaping
s:%:%25:g
s: :%20:g
s:<:%3C:g
s:>:%3E:g
s:#:%23:g
s:{:{:g
s:}:}:g
s:|:%7C:g
s:::g
s:^:%5E:g
s:~:%7E:g
s:[:%5B:g
s:]:%5D:g
s:`:%60:g
s:;:%3B:g
s:/:%2F:g
s:?:%3F:g
s^:^%3A^g
s:@:%40:g
s:=:%3D:g
s:&:&:g
s:$:%24:g
s:!:%21:g
s:*:%2A:g


echo -ne 'some randomnbytes' | xxd -plain | tr -d 'n' | sed 's/(..)/%1/g'

编辑:

http://qa.debian.org/popcon-png.php?packages=vim-common,bsdmainutils&show_installed=1&want_legend=1&want_ticks=1

不过,这里仍然有一个使用hexdump而不是xxd的版本,它可以避免tr调用:


echo -ne 'some randomnbytes' | hexdump -v -e '/1"%02x"' | sed 's/(..)/%1/g'


perl -p -e 's/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg'

( )

我发现在python中更容易阅读:


encoded_value=$(python -c"import urllib; print urllib.quote('''$value''')")

在标准库中,


"http://www.rai.it/dl/audio/""1264165523944Ho servito il re d'Inghilterra - Puntata 7

变体之一,可能是丑陋的,但是很简单:


urlencode() {
 local data
 if [[ $# != 1 ]]; then
 echo"Usage: $0 string-to-urlencode"
 return 1
 fi
 data="$(curl -s -o /dev/null -w %{url_effective} --get --data-urlencode"$1""")"
 if [[ $? != 3 ]]; then
 echo"Unexpected error" 1>&2
 return 2
 fi
 echo"${data##/?}"
 return 0
}

直接链接到awk版本: http://www.shelldorado.com/scripts/cmds/urlencode
我用了好几年了,效果很好


:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven@odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : @(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738:"Uniform Resource Locators (URL)" and
# RFC 1866:"Hypertext Markup Language - 2.0" (HTML)
#
# This encoding is used i.e. for the MIME type
#"application/x-www-form-urlencoded"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP"POST" request). If the desired output should be one
# line, use the"-l" option.
#
# o The"-l" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it's output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################

PN=`basename"$0"` # Program name
VER='1.4'

: ${AWK=awk}

Usage () {
 echo >&2"$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
 -l: encode line endings (result will be one line of output)

The default is to encode each input line on its own."
 exit 1
}

Msg () {
 for MsgLine
 do echo"$PN: $MsgLine" >&2
 done
}

Fatal () { Msg"$@"; exit 1; }

set -- `getopt hl"$@" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage #"getopt" detected an error

EncodeEOL=no
while [ $# -gt 0 ]
do
 case"$1" in
 -l) EncodeEOL=yes;;
 --) shift; break;;
 -h) Usage;;
 -*) Usage;;
 *) break;; # First file name
 esac
 shift
done

LANG=C export LANG
$AWK '
 BEGIN {
 # We assume an awk implementation that is just plain dumb.
 # We will convert an character to its ASCII value with the
 # table ord[], and produce two-digit hexadecimal output
 # without the printf("%02X") feature.

 EOL ="%0A" #"end of line" string (encoded)
 split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab,"")
 hextab [0] = 0
 for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i)"" ] = i + 0
 if ("'"$EncodeEOL"'" =="yes") EncodeEOL = 1; else EncodeEOL = 0
 }
 {
 encoded =""
 for ( i=1; i<=length ($0); ++i ) {
 c = substr ($0, i, 1)
 if ( c ~ /[a-zA-Z0-9.-]/ ) {
 encoded = encoded c # safe character
 } else if ( c =="" ) {
 encoded = encoded"+" # special handling
 } else {
 # unsafe character, encode it as a two-digit hex-number
 lo = ord [c] % 16
 hi = int (ord [c] / 16);
 encoded = encoded"%" hextab [hi] hextab [lo]
 }
 }
 if ( EncodeEOL ) {
 printf ("%s", encoded EOL)
 } else {
 print encoded
 }
 }
 END {
 #if ( EncodeEOL ) print""
 }
'"$@"


url=$(echo"$1" | sed -e 's/%/%25/g' -e 's/ /%20/g' -e 's/!/%21/g' -e 's/"/%22/g' -e 's/#/%23/g' -e 's/$/%24/g' -e 's/&/&/g' -e 's/'''/%27/g' -e 's/(/%28/g' -e 's/)/%29/g' -e 's/*/%2a/g' -e 's/+/%2b/g' -e 's/,/%2c/g' -e 's/-/%2d/g' -e 's/./%2e/g' -e 's///%2f/g' -e 's/:/%3a/g' -e 's/;/%3b/g' -e 's//%3e/g' -e 's/?/%3f/g' -e 's/@/%40/g' -e 's/[/%5b/g' -e 's///g' -e 's/]/%5d/g' -e 's/^/%5e/g' -e 's/_/%5f/g' -e 's/`/%60/g' -e 's/{/{/g' -e 's/|/%7c/g' -e 's/}/}/g' -e 's/~/%7e/g')

这将在$1中编码字符串并在$url.中输出。

...