next up previous
Next: References Up: Experiences with modularizing the Previous: Conclusions


Future Work

The work on the modularized core is currently on hiatus. However, before that happened, we briefly investigated the following possibilities for additional optional features, with estimates for amount of work and possible gain. It should also be noted that the sources of the modularized core are available through the Tcl CVS at SourceForge, under the branch-tag mod-8-3-4-branch. The license is the same as for the unmodified Tcl core itself.

The estimates are given in both Lines Of Code (LOC) for the sources, and a percentage of the total size of the static library. The lines of codes were counted using "wc -l". Nothing was done to take comments into account. This means that the percentages given below can be seriously off (overestimation) given the extensive commenting of the Tcl core code. The percentages are based on the contents of Table 7 and Table 8, which list the sizes of the various object files.


Table 7: Object sizes I
Object file #byte % of Total
regcomp.o 40368 8.30
tclExecute.o 26540 5.45
tclIO.o 25296 5.20
tclCmdMZ.o 17160 3.53
tclBasic.o 16656 3.42
tclVar.o 16584 3.41
tclCompCmds.o 16256 3.34
tclCompile.o 16024 3.29
tclCmdAH.o 14228 2.92
tclNamesp.o 13712 2.82
tclUtf.o 13396 2.75
tclDate.o 12752 2.62
tclCmdIL.o 12660 2.60
tclFileName.o 12020 2.47
tclPosixStr.o 10972 2.25
tclInterp.o 10196 2.10
tclEncoding.o 10008 2.06
regexec.o 9296 1.91
tclParse.o 9252 1.90
tclUnixChan.o 9196 1.89
tclUtil.o 8600 1.77
tclIOCmd.o 7828 1.61
tclBinary.o 7804 1.60
tclScan.o 7380 1.52
tclProc.o 7288 1.50
tclUnixFCmd.o 6604 1.36
tclParseExpr.o 6452 1.33
tclUnixInit.o 6040 1.24
tclPipe.o 6040 1.24
tclPkg.o 5544 1.14
tclObj.o 5520 1.13
tclCompExpr.o 5512 1.13
tclFCmd.o 5272 1.08
tclStringObj.o 4728 0.97
tclUnixPipe.o 4196 0.86
tclTimer.o 4088 0.84
tclEvent.o 4036 0.83
Total 486648 100.00


Table 8: Object sizes II
Object file #byte % of Total
tclListObj.o 3760 0.77
tclIOGT.o 3760 0.77
tclRegexp.o 3700 0.76
tclResult.o 3608 0.74
tclLoad.o 3556 0.73
tclUnixFile.o 3324 0.68
tclIOUtil.o 3300 0.68
tclMain.o 3296 0.68
tclHash.o 3296 0.68
tclStubInit.o 2960 0.61
tclLiteral.o 2784 0.57
tclNotify.o 2596 0.53
tclEnv.o 2396 0.49
tclClock.o 2304 0.47
tclLink.o 2240 0.46
tclGet.o 2164 0.44
regerror.o 1972 0.41
tclUnixNotfy.o 1784 0.37
tclIndexObj.o 1668 0.34
tclPreserve.o 1500 0.31
tclResolve.o 1228 0.25
tclThread.o 1216 0.25
tclUnixTime.o 1160 0.24
tclCkalloc.o 1116 0.23
tclLoadDl.o 1044 0.21
tclAsync.o 1028 0.21
tclIOSock.o 992 0.20
tclStubLib.o 980 0.20
tclHistory.o 920 0.19
tclPanic.o 876 0.18
tclAppInit.o 776 0.16
tclUnixSock.o 760 0.16
tclUnixEvent.o 752 0.15
tclAlloc.o 620 0.13
tclMtherr.o 616 0.13
regfree.o 560 0.12
tclUnixThrd.o 532 0.11
Total 486648 100.00

The whole interpreter (115 files, matching the glob pattern tcl/{generic,unix}/*.c)[*] comes in at 3214256 LOC and 486648 Byte. This is 100 %.

  1. Removal of the event system at large (Commands after , vwait , and update ).

    File Touched  
    tclInt.h 1 line  
    tclTimer.c 1129 lines (all)  
    tclBasic.c 3 lines  
    tclEvent.c 547 lines (about half[*])  
    tclNotify.c 1081 lines (all)  
    tclUnixNotfy.c 1050 lines (all) [*]  
    tclUnixEvent.c 77 lines (all) [*]  
    tclWinNotify.c 522 lines (all) [*]  
    tclMacNotify.c 581 lines (all) [*]  
      4991 lines 0.15 %
      binary 2.30 %

  2. Removal of the handling of binary data (Command binary ).

    File Touched  
    tclBasic.c 1 lines  
    tclInt.h 2 lines  
    tclBinary.c 1552 lines (all)  
      1555 lines 0.04 %
      binary 1.60 %

    Alternative: Leave Tcl_ObjType ``tclByteArray'' in.

    File Touched  
    tclBinary.c 1027 lines (  2/3 of file)  
      1030 lines 0.03 %
      binary 1.06 %

  3. Removal of the handling of times and dates (Command clock ).

    File Touched  
    tclBasic.c 1 lines  
    tclInt.h 2 lines  
    tclClock.c 377 lines (all)  
    tclDate.c 1873 lines (all)  
      2253 lines 0.07 %
      binary 3.09 %

  4. Removal of the package system (Command package ).

    File Touched  
    tclBasic.c 1 line  
    tclInt.h 2 lines  
    tclPkg.c 979 lines  
      982 lines 0.03 %
      binary 1.14 %

  5. Removal of new string manipulation functionality (Command string ).

    File Touched  
    tclCmdMZ.c 1331 lines[*]  
      Lines at most 0.04 %
      binary 1.58 %

  6. Reverting lsort to a quicksort based implementation, cutting out our own mergesort-based implementation.

    File Touched  
    tclCmdIL.c 678 lines (lsort)  
      Lines at most 0.02 %
      binary 0.54 %

  7. Removal of the bytecode compiler.

    File Touched  
    tclCompCmds.c 2043 lines (all)  
    tclCompExpr.c 1051 lines (all)  
    tclCompile.c 3414 lines (all)  
    Entrypoints ... 300 lines (estim.)  
      6808 lines 0.21 %
      binary 7.76 %

  8. Remove of the bytecode executor. This implies the removal of the bytecode compiler. Without execution of bytecodes its compilation makes no sense.

    File Touched  
    tclCompCmds.c 2043 lines (all)  
    tclCompExpr.c 1051 lines (all)  
    tclCompile.c 3414 lines (all)  
    tclExecute.c 6412 lines (all)  
    Entrypoints ... 300 lines (estim.)  
      13220 lines 0.41 %
      binary 13.21 %

  9. Removal of regular expressions.

    File Touched  
    regc_color.c 17775 lines  
    regc_cvec.c 5094 lines  
    regc_lex.c 24495 lines  
    regc_locale.c 34453 lines  
    regc_nfa.c 36234 lines  
    regcomp.c 59492 lines  
    rege_dfa.c 17820 lines  
    regerror.c 3515 lines  
    regexec.c 28360 lines  
    regfree.c 2086 lines  
    regfronts.c 2394 lines  
    tclRegexp.c 1029 lines  
      232747 lines 7.24 %
      binary 11.5 %

  10. Removal of namespaces (Command namespace ).

    File Touched
    tclNamepace.c 3916 lines (mostly)[*]
    tclVar.c 4813 lines
    tclParse.c 2357 lines
    tclParseExpr.c 1870 lines
    various (set, proc)  1000 lines

    A simple cut-out of this feature is not possible, we will rather have to rewrite parts of the parser, and of commands like set and proc to remove the special handling of the colon (:), the namespace separator character, from the system.

    About 13956 LOC have to be touched for this, which is about 0.43 %. Circa 2.82 % of the binary are definitely removed.

  11. Removal of encodings (Command encoding ). We did not count the size of the then irrelevant encoding files, as we did not count them as part of the whole code base either.

    File Touched
    tclEncoding.c 2871 lines

    How much is removed from the file above depends on the chosen model, of which we have two:

    1. Deactivating encodings completely may remove about 80-90 % of that file. This are circa 1.06 % of the binary.

      Not everything might be removed because we believe that the best way to remove UTF8 completely is to rewrite the Utf <-> External converter functions and throw away the rest. That way we don't have to think about all the other places which do UTF8.

      If we remove this completely we have to touch many more places throughout the whole code, most notably the channel system. The latter would bring the number of LOCs removed or rewritten up, but also takes much longer. We currently have no good LOC estimate for this scenario.

      As a first approximation we grep'ped the sources for ``Tcl_Utf'', which gives us 314 locations in 40 files. We guesstimate that each location translates into 1-4 lines of code touched directly. And depending on context maybe 5-20 others around each location which have to change too. That would be between 314 and 6280 LOC changed, i.e. replaced with different, non-UTF, code.

      The number of 5-20 other lines depending on context might be an underestimation for the channel system. This part of the core will very likely need a complete reorganization to allow usage both with and without encodings. This would be 8389 lines changed in tclIO.c. Changed, not cut!

      But also note the fact that tclIO.c is with 5.20 % of the binary also the third-largest file right now.

      Summary: About 17540 LOC have to be touched for this, which is about 0.54 % of the whole sources.

    2. whereas just deactivating the loading of external encodings may remove 60-70 % of that file.

  12. The generic part of the I/O system.

    While this subsystem is with 8.46 % of the binary code the third-largest part of the core after regular expressions and the engine for the execution of bytecodes, it also a tangled web and in our opinion at least very difficult to unravel.

    Especially as it is heavily influenced by the choice of whether to use encodings or not, and also if it has to support the notifier or not, i.e. file events.

    No estimates were made for this part of the core.

It was noted before that Source Navigator crashed when processing the Tcl sources. This not the case for the newest version, 5.1. This means that our ability to determine which parts of the code have to be made conditional, or are dependent on more than one feature is greatly enhanced. Of course, we will have to write special scripts which mine the dependency database for the information we need. This however is less difficult than searching through the sources by ourselves, and less error-prone.

Such help is especially important for a future up-port of the modularization changes to 8.4. The internal organization of the code has changed so much that the patches we could generate from the comparison of an unmodified versus an modularized 8.3.4 core are essentially useless. The only parts which can be lifted over relatively easily will be the changes to reduce the consumption of stack space.


next up previous
Next: References Up: Experiences with modularizing the Previous: Conclusions
andreas_kupries@users.sourceforge.net