(sed.info.gz) Reporting Bugs

Info Catalog (sed.info.gz) Other Resources (sed.info.gz) Top (sed.info.gz) Extended regexps
 
 7 Reporting Bugs
 ****************
 
 Email bug reports to <bug-sed@gnu.org>.  Also, please include the
 output of `sed --version' in the body of your report if at all possible.
 
    Please do not send a bug report like this:
 
      while building frobme-1.3.4
      $ configure
      error--> sed: file sedscr line 1: Unknown option to 's'
 
    If GNU `sed' doesn't configure your favorite package, take a few
 extra minutes to identify the specific problem and make a stand-alone
 test case.  Unlike other programs such as C compilers, making such test
 cases for `sed' is quite simple.
 
    A stand-alone test case includes all the data necessary to perform
 the test, and the specific invocation of `sed' that causes the problem.
 The smaller a stand-alone test case is, the better.  A test case should
 not involve something as far removed from `sed' as "try to configure
 frobme-1.3.4".  Yes, that is in principle enough information to look
 for the bug, but that is not a very practical prospect.
 
    Here are a few commonly reported bugs that are not bugs.
 
 `N' command on the last line
      Most versions of `sed' exit without printing anything when the `N'
      command is issued on the last line of a file.  GNU `sed' prints
      pattern space before exiting unless of course the `-n' command
      switch has been specified.  This choice is by design.
 
      For example, the behavior of
           sed N foo bar
      would depend on whether foo has an even or an odd number of
      lines(1).  Or, when writing a script to read the next few lines
      following a pattern match, traditional implementations of `sed'
      would force you to write something like
           /foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
      instead of just
           /foo/{ N;N;N;N;N;N;N;N;N; }
 
      In any case, the simplest workaround is to use `$d;N' in scripts
      that rely on the traditional behavior, or to set the
      `POSIXLY_CORRECT' variable to a non-empty value.
 
 Regex syntax clashes (problems with backslashes)
      `sed' uses the POSIX basic regular expression syntax.  According to
      the standard, the meaning of some escape sequences is undefined in
      this syntax;  notable in the case of `sed' are `\|', `\+', `\?',
      `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.
 
      As in all GNU programs that use POSIX basic regular expressions,
      `sed' interprets these escape sequences as special characters.
      So, `x\+' matches one or more occurrences of `x'.  `abc\|def'
      matches either `abc' or `def'.
 
      This syntax may cause problems when running scripts written for
      other `sed's.  Some `sed' programs have been written with the
      assumption that `\|' and `\+' match the literal characters `|' and
      `+'.  Such scripts must be modified by removing the spurious
      backslashes if they are to be used with modern implementations of
      `sed', like GNU `sed'.
 
      On the other hand, some scripts use s|abc\|def||g to remove
      occurrences of _either_ `abc' or `def'.  While this worked until
      `sed' 4.0.x, newer versions interpret this as removing the string
      `abc|def'.  This is again undefined behavior according to POSIX,
      and this interpretation is arguably more robust: older `sed's, for
      example, required that the regex matcher parsed `\/' as `/' in the
      common case of escaping a slash, which is again undefined
      behavior; the new behavior avoids this, and this is good because
      the regex matcher is only partially under our control.
 
      In addition, this version of `sed' supports several escape
      characters (some of which are multi-character) to insert
      non-printable characters in scripts (`\a', `\c', `\d', `\o', `\r',
      `\t', `\v', `\x').  These can cause similar problems with scripts
      written for other `sed's.
 
 `-i' clobbers read-only files
      In short, `sed -i' will let you delete the contents of a read-only
      file, and in general the `-i' option ( Invocation Invoking
      sed.) lets you clobber protected files.  This is not a bug, but
      rather a consequence of how the Unix filesystem works.
 
      The permissions on a file say what can happen to the data in that
      file, while the permissions on a directory say what can happen to
      the list of files in that directory.  `sed -i' will not ever open
      for writing  a file that is already on disk.  Rather, it will work
      on a temporary file that is finally renamed to the original name:
      if you rename or delete files, you're actually modifying the
      contents of the directory, so the operation depends on the
      permissions of the directory, not of the file.  For this same
      reason, `sed' does not let you use `-i' on a writeable file in a
      read-only directory, and will break hard or symbolic links when
      `-i' is used on such a file.
 
 `0a' does not work (gives an error)
      There is no line 0.  0 is a special address that is only used to
      treat addresses like `0,/RE/' as active when the script starts: if
      you write `1,/abc/d' and the first line includes the word `abc',
      then that match would be ignored because address ranges must span
      at least two lines (barring the end of the file); but what you
      probably wanted is to delete every line up to the first one
      including `abc', and this is obtained with `0,/abc/d'.
 
 `[a-z]' is case insensitive
      You are encountering problems with locales.  POSIX mandates that
      `[a-z]' uses the current locale's collation order - in C parlance,
      that means using `strcoll(3)' instead of `strcmp(3)'.  Some
      locales have a case-insensitive collation order, others don't.
 
      Another problem is that `[a-z]' tries to use collation symbols.
      This only happens if you are on the GNU system, using GNU libc's
      regular expression matcher instead of compiling the one supplied
      with GNU sed.  In a Danish locale, for example, the regular
      expression `^[a-z]$' matches the string `aa', because this is a
      single collating symbol that comes after `a' and before `b'; `ll'
      behaves similarly in Spanish locales, or `ij' in Dutch locales.
 
      To work around these problems, which may cause bugs in shell
      scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables
      to `C'.
 
 `s/.*//' does not clear pattern space
      This happens if your input stream includes invalid multibyte
      sequences.  POSIX mandates that such sequences are _not_ matched
      by `.', so that `s/.*//' will not clear pattern space as you would
      expect.  In fact, there is no way to clear sed's buffers in the
      middle of the script in most multibyte locales (including UTF-8
      locales).  For this reason, GNU `sed' provides a `z' command (for
      `zap') as an extension.
 
      To work around these problems, which may cause bugs in shell
      scripts, set the `LC_COLLATE' and `LC_CTYPE' environment variables
      to `C'.
 
    ---------- Footnotes ----------
 
    (1) which is the actual "bug" that prompted the change in behavior
 
Info Catalog (sed.info.gz) Other Resources (sed.info.gz) Top (sed.info.gz) Extended regexps
automatically generated by info2html