Link SVN files for a system repository
if you don't want to have the whole /etc under version control
[Intro] [C program]

Intro

Subversion, svn for short, is a handy version control system. Nowadays, Git is more on fashion, especially for team projects. Yet, for single individuals or restricted groups, Subversion works well and is somewhat simpler. However, it has a couple of drawbacks. One, it creates a .svn subdirectory in every controlled directory. Two, it saves symbolic links as symbolic links, which implies the content is lost unless the linked file is itself under version control in the same repository. So, to backup just selected files from /etc or similar large folders, we want to either add hard links to them or to make a flat copy of them in a single directory inside our repository.

Subversion is extensible in that it provides for properties; that is, versioned metadata attached to any versioned object. Properties used by svn have names beginning with svn:. Typically, svn automatically sets svn:eol-style and svn:mime-type on each file. Users may maintain svn:ignore on directories, using svn prop* subcommands (mostly propedit, abbreviated pe).

Using a custom property named lnsvn we keep trace of files located outside the repository. The first version of this program only did links, not copies, so the default directory containing links or copies is called LINKS (all capitals). The lnsvn property of this directory contains a list of filenames (full paths) that are outside the repository but in the same partition. The base name of each file, unaltered, becomes either a hard link to or a flat copy of the original file. Assuming you already have a versioned directory, to start versioning links to selected files do so:

here:repository$ svn mkdir LINKS
here:repository$ svn pe lnsvn LINKS
... edit the file (see text below) ...
Saving the file sets new value for property 'lnsvn' on 'LINKS'
here:repository$ lnsvn
It adds links in LINKS
here:repository$ svn add LINKS/*
A         LINKS/an_added_file
...
here:repository$ svn commit

In the property file, type the full path of the file(s) you want to link. If you want it to be copied write COPY /full/path/of/file rather that just /full/path/of/file. Save the property. Running lnsvn actually creates the copies or the links. Finally, add the resulting files to svn version control. Of course, you should have compiled the program below and placed it somewhere on the PATH with setuid permission (so it can read anything) before calling it.

You can read the format of the property file in the fgets() loop below: initial spaces are discarded, # comments and empty lines are ignored, relative path are not permitted, and the keyword COPY can be set before the filename to force actually copying the file instead of hard linking. If the owner of the file is not the same as the owner of the LINKS directory, the file is copied anyway. That's necessary for files, like /etc/crontab that don't want to be hard linked. Can also be used for files in different partitions (recall hard links have that limit).

The program reads the lnsvn property and checks every link/copy. Just remember to run it before committing.

C program

001: // lnsvn.c written by vesely in milan on Thu 25 Nov 2021
002: 
003: // gcc -g -W -Wall -o ~/bin/lnsvn lnsvn.c
004: // sudo chown root:staff ~/bin/lnsvn
005: // sudo chmod u+s ~/bin/lnsvn
006: 
007: 
008: char help_string[] =
009: " Usage:\n"
010: "\n"
011: "    lnsvn [options] [TARGET...]\n"
012: "\n"
013: " The only options are -v to increase verbosity and -h for this help\n"
014: "\n"
015: " Read the \"lnsvn\" property of each TARGET, by default the LINKS \n"
016: " directory.  The property should contain a list of \"original\" files, \n"
017: " given with full path outside the repository. The basename of each \n"
018: " file becomes either a hard link to or a flat copy of the listed \n"
019: " original file, in the current TARGET.\n"
020: "\n"
021: " lnsvn checks that the correspondence holds, by verifying the inode \n"
022: " number of the links or the size and date of the copies.  If they don't \n"
023: " match, the target file in the working copy is deleted and relinked to \n"
024: " or copied from the original file.  Linking or copying is determined\n"
025: " by the COPY prefix in the property and by the file owner.  Files that\n"
026: " have an st_uid different from that of the LINKS directory are copied,\n"
027: " as if the COPY prefix was specified.\n"
028: "\n"
029: " Target files should never be edited!  Editing the original file can\n"
030: " remove the link, as the editor may create a new file on editing.\n"
031: "\n"
032: " If the original file doesn't exist, lnsvn signals an error.\n"
033: " If a file in the target directory has no original file, a warning.\n";
034: 
035: /*
036: * TODO:
037: * - run svn add for new links
038: */
039: #define _GNU_SOURCE
040: #include <stdio.h>
041: #include <stdlib.h>
042: #include <stdint.h>
043: #include <string.h>
044: #include <ctype.h>
045: #include <sys/types.h>
046: #include <sys/stat.h>
047: #include <sys/wait.h>
048: #include <sys/sendfile.h>
049: #include <unistd.h>
050: #include <dirent.h>
051: #include <fcntl.h>
052: #include <stdarg.h>
053: #include <errno.h>
054: 
055: #include <assert.h>
056: 
057: 
058: static const char *program_name;
059: static inline char *my_basename(char const *name) // neither GNU nor POSIX...
060: {
061:    char *b = strrchr(name, '/');
062:    if (b)
063:       return b + 1;
064:    return (char*)name;
065: }
066: 
067: 
068: #if defined __GNUC__
069: __attribute__ ((format(printf, 1, 2)))
070: #endif
071: static void print_err(char const *fmt, ...)
072: {
073:    fprintf(stderr, "%s: ", program_name);
074:    va_list ap;
075:    va_start(ap, fmt);
076:    vfprintf(stderr, fmt, ap);
077:    va_end(ap);
078:    fputc('\n', stderr);
079: }
080: 
081: typedef struct lnsvn
082: {
083:    struct lnsvn *next;
084:    char const *basename; // inside orig
085:    int lineno;           // line in property
086:    uint8_t copy;         // copy this file instead of linking it
087:    char orig[1];         // original file (variable size structure)
088: } lnsvn;
089: 
090: // global options
091: static int verbose = 0;
092: typedef enum verbose_what
093: {
094:    verbose_none,
095:    verbose_file,
096:    verbose_load
097: } verbose_what;
098: 
099: 
100: static void free_lnsvn(lnsvn *l)
101: {
102:    while (l)
103:    {
104:       lnsvn *tmp = l->next;
105:       free(l);
106:       l = tmp;
107:    }
108: }
109: 
110: static int load_lsvn(char const *target, lnsvn **base)
111: /*
112: * Load a linked list of lnsvn in basename collating order.
113: * Return -1 on hard error, 1 on soft error, 0 on OK.
114: */
115: {
116:    int rtc = 0;
117:    assert(base);
118: 
119:    *base = NULL;
120: 
121:    int pipefd[2];
122:    if (pipe(pipefd) != 0)
123:    {
124:       print_err("pipe failure: %s", strerror(errno));
125:       return -1;
126:    }
127: 
128:    pid_t pid = fork();
129:    if (pid == -1)
130:    {
131:       print_err("fork failure: %s", strerror(errno));
132:       close(pipefd[0]);
133:       close(pipefd[1]);
134:       return -1;
135:    }
136: 
137:    if (pid == 0) // child, get property on stdout
138:    {
139:       close(pipefd[0]); // unused read end
140:       close(STDOUT_FILENO);
141:       errno = 0;
142:       if (dup(pipefd[1]) != STDOUT_FILENO)
143:       {
144:          print_err("dup failure: %s", strerror(errno));
145:          close(pipefd[1]);
146:          _exit(1);
147:       }
148: 
149:       uid_t ruid, euid, suid; // drop privileges is setuid
150:       if (getresuid(&ruid, &euid, &suid) != 0 ||
151:          setresuid(suid, suid, suid) != 0)
152:       {
153:          print_err("get/set uid failure: %s", strerror(errno));
154:          close(pipefd[1]);
155:          _exit(1);
156:       }
157: 
158:       execlp("svn", "svn", "propget", "lnsvn", NULL);
159:       print_err("execlp failure: %s", strerror(errno));
160:       _exit(1);
161:    }
162: 
163:    // parent, read the property and build the list
164:    close(pipefd[1]); // unused write end
165:    FILE *filepipe = fdopen(pipefd[0], "r");
166:    if (filepipe == NULL)
167:    {
168:       print_err("fdopen failure: %s", strerror(errno));
169:       close(pipefd[0]);
170:       rtc = -1;
171:    }
172:    else
173:    {
174:       int lineno = 0;
175:       char *s, buf[PATH_MAX];
176:       while ((s = fgets(buf, sizeof buf, filepipe)) != NULL)
177:       {
178:          ++lineno;
179:          char *eol = strchr(s, '\n');
180:          int ch;
181:          if (eol == NULL)
182:          {
183:             rtc = 1;
184:             print_err("line %d of property too long, max=%zu",
185:                lineno, sizeof(buf));
186:             while ((ch = fgetc(filepipe)) != EOF)
187:                if (ch == '\n')
188:                   break;
189:          }
190:          else
191:          {
192:             // left trim, discard comments
193:             while (isspace(ch = *(unsigned char*)s) && ch != 0)
194:                ++s;
195:             if (ch == '#')
196:                continue;
197: 
198:             // right trim, discard empty lines
199:             --eol;
200:             while (s < eol && isspace(*(unsigned char*)eol))
201:                --eol;
202:             if (s >= eol)
203:                continue;
204: 
205:             *++eol = 0;
206: 
207:             // check special directive
208:             int copy = 0;
209:             if (*s != '/')
210:             {
211:                char *d = s;
212:                while (s < eol && !isspace(*(unsigned char*)s))
213:                   ++s;
214:                if (s >= eol)
215:                {
216:                   print_err(
217:                      "line %d: relative file or bare directive: %s",
218:                      lineno, d);
219:                   continue;
220:                }
221: 
222:                // left trim filename
223:                *s++ = 0;
224:                while (s < eol && isspace(*(unsigned char*)s))
225:                   ++s;
226:                assert(s < eol);
227: 
228:                if (strcasecmp(d, "copy") == 0)
229:                   copy = 1;
230:                else
231:                {
232:                   print_err(
233:                      "line %d: unrecognized directive %s for %s",
234:                      lineno, d, s);
235:                   continue;
236:                }
237:             }
238: 
239:             // check regular file
240:             struct stat st;
241:             if (lstat(s, &st) != 0)
242:             {
243:                print_err("line %d: file not found %s: %s",
244:                   lineno, s, strerror(errno));
245:                continue;
246:             }
247: 
248:             if (!S_ISREG(st.st_mode))
249:             {
250:                print_err("line %d: not a regular file %s",
251:                   lineno, s);
252:                continue;
253:             }
254: 
255:             unsigned size = eol - s + sizeof(lnsvn) + 1;
256:             lnsvn *l = malloc(size);
257:             if (l == NULL)
258:             {
259:                print_err("MEMORY FAILURE");
260:                free_lnsvn(*base);
261:                *base = NULL;
262:                rtc = -1;
263:                break;
264:             }
265: 
266:             memset(l, 0, sizeof *l);
267:             strcpy(l->orig, s);
268:             l->basename = my_basename(l->orig);
269:             l->lineno = lineno;
270:             l->copy = copy;
271:             if (verbose >= verbose_load)
272:                printf("%4d %s%s\n", lineno,
273:                   copy? "COPY ": "", l->basename);
274: 
275:             lnsvn **pl;
276:             for (pl = base; *pl; pl = &(*pl)->next)
277:             {
278:                int cmp = strcmp(l->basename, (*pl)->basename);
279:                if (cmp < 0)
280:                   break;
281: 
282:                if (cmp == 0)
283:                {
284:                   print_err("dup name %s at lines %d and %d in %s",
285:                      l->basename, (*pl)->lineno, l->lineno, target);
286:                   free(l);
287:                   l = NULL;
288:                   rtc = 1;
289:                   break;
290:                }
291:             }
292: 
293:             if (l) // good
294:             {
295:                l->next = *pl;
296:                *pl = l;
297:             }
298:          }
299:       }
300: 
301:       int save_errno = 0;
302:       if (ferror(filepipe))
303:          save_errno = errno;
304:       if (fclose(filepipe) && save_errno == 0)
305:          save_errno = errno;
306:       if (save_errno)
307:       {
308:          rtc = 1;
309:          print_err("error reading from pipe: %s", strerror(save_errno));
310:       }
311:    }
312: 
313:    int wstatus;
314:    pid_t wpid = waitpid(pid, &wstatus, 0);
315:    if (wpid != pid)
316:    {
317:       print_err("waitpid failure: %s", strerror(errno));
318:    }
319:    else if (WIFEXITED(wstatus))
320:    {
321:       if (WEXITSTATUS(wstatus) != 0)
322:          rtc |= 1;
323:    }
324:    else if (WIFSIGNALED(wstatus))
325:    {
326:       int sig = WTERMSIG(wstatus);
327:       print_err("reading property interrupted by %s (%d)%s",
328:          strsignal(sig), sig, WCOREDUMP(wstatus)? ", core dumped": "");
329:       rtc = -1;
330:    }
331: 
332:    return rtc;
333: }
334: 
335: lnsvn **find_lnsvn(lnsvn **base, char const *name)
336: {
337:    assert(base);
338:    assert(name);
339: 
340:    lnsvn **p = base, *l;
341:    while ((l = *p) != NULL)
342:    {
343:       int cmp = strcmp(l->basename, name);
344:       if (cmp < 0)
345:       {
346:          p = &(*p)->next;
347:          continue;
348:       }
349: 
350:       return cmp > 0? NULL: p;
351:    }
352: 
353:    return NULL;
354: }
355: 
356: static int copyfile(lnsvn *l, struct stat *st_orig, uid_t st_uid)
357: /*
358: * Assume the current directory is target.  Copy the original file
359: * (instead of linking it).
360: *
361: * Return 0 if ok
362: */
363: {
364:    int read_fd = open(l->orig, O_RDONLY);
365:    if (read_fd < 0)
366:    {
367:       print_err("Cannot read %s: %s", l->orig, strerror(errno));
368:       return -1;
369:    }
370: 
371:    int write_fd = open(l->basename, O_WRONLY| O_CREAT,
372:       S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
373:    if (write_fd < 0)
374:    {
375:       print_err("Cannot write %s: %s", l->basename, strerror(errno));
376:       close(read_fd);
377:       return -1;
378:    }
379: 
380:    off_t size = st_orig->st_size;
381:    do
382:    {
383:       off_t out = sendfile(write_fd, read_fd, NULL, size);
384:       if (out <= 0)
385:          break;
386: 
387:       size -= out;
388:    } while (size > 0);
389: 
390:    int rtc = 0;
391:    if (size != 0)
392:    {
393:       print_err("Cannot copy %s: %s", l->orig, strerror(errno));
394:       rtc = 1;
395:    }
396: 
397:    if (fchown(write_fd, st_uid, -1))
398:    {
399:       print_err("Cannot chown %s: %s", l->basename, strerror(errno));
400:       rtc = 1;
401:    }
402: 
403:    const struct timespec times[2] =
404:    {{0, UTIME_OMIT},
405:    {st_orig->st_mtim.tv_sec, st_orig->st_mtim.tv_nsec}};
406: 
407:    if (futimens(write_fd, times) != 0)
408:    {
409:       print_err("Cannot set time on %s: %s",
410:          l->basename, strerror(errno));
411:       rtc = 1;
412:    }
413: 
414:    close(read_fd);
415:    close(write_fd);
416: 
417:    return rtc;
418: } 
419: 
420: static int run_target(uid_t st_uid, char const *target)
421: /*
422: * Assume the current directory is target.  Obtain the linked list of
423: * original files, then read the target and check the corresponding file
424: * for each regular file.
425: */
426: {
427:    lnsvn *base;
428:    int rtc = load_lsvn(target, &base);
429: 
430:    if (rtc >= 0)
431:    {
432:       DIR *dirp = opendir(".");
433:       if (dirp == NULL)
434:       {
435:          print_err("Cannot read %s: %s", target, strerror(errno));
436:          free_lnsvn(base);
437:          return -1;
438:       }
439: 
440:       struct dirent *dir;
441:       while ((dir = readdir(dirp)) != NULL)
442:       {
443:          struct stat st, st_orig;
444:          if (stat(dir->d_name, &st) != 0)
445:          {
446:             print_err("stat failure for %s/%s: %s",
447:                target, dir->d_name, strerror(errno));
448:             rtc = 1;
449:             continue;
450:          }
451: 
452:          if (!S_ISREG(st.st_mode))
453:             continue;
454: 
455:          lnsvn *l, **pl = find_lnsvn(&base, dir->d_name);
456:          if (pl == NULL)
457:          {
458:             print_err("%s/%s doesn't appear in property",
459:                target, dir->d_name);
460:             continue;
461:          }
462: 
463:          l = *pl;
464:          assert(strcmp(l->basename, dir->d_name) == 0);
465: 
466:          if (stat(l->orig, &st_orig) != 0)
467:          {
468:             print_err("cannot stat %s at line %d of %s",
469:                l->orig, l->lineno, target);
470:             *pl = l->next; // remove from list
471:             free(l);
472:             rtc = 1;
473:             continue;
474:          }
475: 
476:          if (!S_ISREG(st_orig.st_mode))
477:          {
478:             print_err("%s is not a regular file, at line %d of %s",
479:                l->orig, l->lineno, target);
480:             rtc = 1;
481:             *pl = l->next;
482:             free(l);
483:             continue;
484:          }
485: 
486:          if (st.st_ino == st_orig.st_ino)
487:          {
488:             if (verbose >= verbose_file)
489:                printf("%4d %s%s: same ino (%jd)\n", l->lineno,
490:                   l->copy? "COPY ": "", l->basename,
491:                   (intmax_t)st.st_ino);
492:             *pl = l->next;
493:             free(l);
494:             continue; // good
495:          }
496: 
497:          if (st_orig.st_mtim.tv_sec == st.st_mtim.tv_sec &&
498:             st_orig.st_mtim.tv_nsec == st.st_mtim.tv_nsec &&
499:             st_orig.st_size == st.st_size)
500:          {
501:             if (verbose >= verbose_file)
502:                printf("%4d %s: same time/size (%jd.%ld/%jd)\n",
503:                   l->lineno, l->basename,
504:                   (intmax_t)st.st_mtim.tv_sec,
505:                   st.st_mtim.tv_nsec, (intmax_t)st.st_size);
506:             *pl = l->next;
507:             free(l);
508:             continue; // good
509:          }
510: 
511:          /*
512:          * Files differ.  Copy or relink.
513:          */
514:          if (unlink(dir->d_name))
515:          {
516:             print_err("cannot unlink %s/%s: %s",
517:                target, dir->d_name, strerror(errno));
518:             rtc = 1;
519:          }
520: 
521:          if (st_orig.st_uid != st_uid)
522:             l->copy = 1;
523: 
524:          if (l->copy)
525:          {
526:             rtc = copyfile(l, &st_orig, st_uid);
527:             if (verbose >= verbose_file && rtc == 0)
528:                printf("%4d COPY %s: copied\n",
529:                l->lineno, l->basename);
530: 
531:          }
532:          else if (link(l->orig, dir->d_name))
533:          {
534:             print_err("cannot re-link %s/%s to %s: %s",
535:                target, dir->d_name, l->orig, strerror(errno));
536:             rtc = 1;
537:          }
538:          else if (verbose >= verbose_file)
539:             printf("%4d %s: relinked\n", l->lineno, l->basename);
540:          *pl = l->next;
541:          free(l);
542:       }
543:       closedir(dirp);
544: 
545:       // Now for the remaining items
546:       for (lnsvn *l = base; l; l = l->next)
547:       {
548:          if (l->copy)
549:          {
550:             struct stat st;
551:             int st_rc = stat(l->orig, &st);
552:             if (st_rc == 0 && copyfile(l, &st, st_uid) == 0)
553:                printf("copied %s to %s\n", l->orig, target);
554:             else if (st_rc != 0)
555:             {
556:                print_err("cannot copy %s to %s/%s: %s",
557:                   l->orig, target, l->basename, strerror(errno));
558:                rtc = 1;
559:             }
560:          }
561:          else if (link(l->orig, l->basename) == 0)
562:             printf("add link to %s in %s\n", l->orig, target);
563:          else
564:          {
565:             print_err("cannot link %s to %s/%s: %s",
566:                l->orig, target, l->basename, strerror(errno));
567:             rtc = 1;
568:          }
569:       }
570:    }
571: 
572:    free_lnsvn(base);
573:    return rtc;
574: }
575: 
576: static int run_target_cd(char const *target)
577: /*
578: * Change directory and call run_target.
579: */
580: {
581:    if (target == NULL || *target == 0)
582:       return 1;
583: 
584:    struct stat st;
585:    int rtc = stat(target, &st);
586:    if (rtc != 0 || !S_ISDIR(st.st_mode))
587:    {
588:       print_err("%s %s", target, rtc?
589:          strerror(errno): "is not a directory");
590:       return -1;
591:    }
592: 
593:    char cwd[PATH_MAX];
594:    if (getcwd(cwd, sizeof cwd) == NULL)
595:    {
596:       print_err("getcwd failure: %s", strerror(errno));
597:       return -1;
598:    }
599: 
600:    if (chdir(target) != 0)
601:    {
602:       print_err("cannot chdir to %s: %s", target, strerror(errno));
603:       return -1;
604:    }
605: 
606:    rtc = run_target(st.st_uid, target);
607: 
608:    chdir(cwd);
609:    return rtc;
610: }
611: 
612: int main(int argc, char *argv[])
613: {
614:    program_name = my_basename(argv[0]);
615: 
616:    int i, opt = 1, target = 0, errs = 0;
617:    for (i = 1; i < argc; ++i)
618:    {
619:       char *a = argv[i];
620:       if (a[0]== '-' && opt)
621:       {
622:          int ch;
623:          while ((ch = *(unsigned char*)++a) != 0)
624:          {
625:             switch (ch)
626:             {
627:                case '-': // end of options
628:                   opt = 0;
629:                   break;
630: 
631:                case 'h':
632:                   puts(help_string);
633:                   return 0;
634: 
635:                case 'v':
636:                   ++verbose;
637:                   break;
638: 
639:                default:
640:                   fprintf(stderr, "invalid option %c in %s\n",
641:                      ch, argv[i]);
642:                   ++errs;
643:                   break;
644:             }
645:          }
646:       }
647:       else
648:       {
649:          ++target;
650:          unsigned l = strlen(a);
651:          if (l > 0)
652:          {
653:             --l;
654:             if (l > 0 && a[l] == '/')  // remove trailing slash except root
655:                a[l] = 0;
656:             int rtc = run_target_cd(a);
657:             if (rtc)
658:             {
659:                ++errs;
660:                if (rtc < 0)
661:                   break;
662:             }
663:          }
664:       }
665:    }
666: 
667:    /*
668:    * Take LINKS as the default target.
669:    */
670:    if (errs == 0 && target == 0)
671:       errs += run_target_cd("LINKS");
672: 
673:    return errs != 0;
674: }
675: 
zero rights

The only option the program takes is -v for verbosity. It can be set multiple times.