Skip to content

Conversation

@svartkanin
Copy link
Collaborator

@svartkanin svartkanin commented Mar 20, 2025

Currently the defined user password is stored in plaintext in the user_credentials.json file if the configuration is saved. A security concern was originally reported in #1111 and another related issue to allow users to define hashes instead of passwords #1029.

Based on this announcement in 2023-09-22 archlinux is using yescrypt as by default https://archlinux.org/news/changes-to-default-password-hashing-algorithm-and-umask-settings/.

After some research there doesn't seem to be a python package capable of generating a yescrypt hash. But as @Torxed suggested down in the comments we can load libcrypt.so and call the crypt function. This generally used on Linux systems for encryption and therefore a reliable option.

So far the password for a defined user account has been set with chpasswd username:password, however, the command also allows to pass a password hash with chpasswd --encrypted, which means we can first generate a yescrypt hash and then pass it to chpasswd

{
    "!root-password": "super_pwd",
    "!users": [
        {
            "!password": "password",
            "sudo": true,
            "username": "username",
            "groups": ["wheel"]
        }
    ]
}

New

{
    "root_enc_password": "<yescrypt_hash>",
    "users": [
        {
            "enc_password": "<yescrypt_hash>",
            "sudo": true,
            "username": "user_name",
            "groups": ["wheel"]
        }
    ]
}

Not solved (yet)

The disk encryption password will continue to be stored in cleartext as I haven't seen any obvious way to use a hash in luks nor for HSM devices.

@codefiles
Copy link
Contributor

Would you please create a separate pull request with just the fix for #1591? The fix for that is only a few lines (-G wheel conditional on sudo parameter). Thank you.

from .profile_menu import ProfileMenu, select_greeter, select_profile
from .profiles_handler import profile_handler

__all__ = [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@correctmost I know you just merged this change but I run into a circular dependency when loading profile configuration from a config file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to delete :). The profile handler and profile model code still need more untangling.

@svartkanin svartkanin marked this pull request as ready for review March 23, 2025 09:55
@svartkanin
Copy link
Collaborator Author

@Torxed should be ready for a review

@codefiles
Copy link
Contributor

codefiles commented Mar 23, 2025

Relevant excerpt from chpasswd(8):

By default, passwords are encrypted by PAM, but (even if not recommended) you can select a different encryption method with the -e, -m, or -c options.

If this is merged, all user passwords created with archinstall will use bcrypt instead of the Arch Linux default of yescrypt. If Arch Linux staff decided on yescrypt it should not be up to archinstall to change this.

This also does not fix #1111 if disk encryption passwords will still be in plaintext.

Sorry but this does not have my support.

@svartkanin
Copy link
Collaborator Author

I'm aware of the documentation and that the PR deviates from the default behaviour. The reason default values exist is to provide a single set option for a user without having to let them choose between multiple options that would all be viable choices. Same in this case, Arch had to pick a single option to go with as default, but that doesn't mean that other options are necessarily worse. bcrypt is widely used as a password hashing algorithm and has been battle proven.

As stated in the description I couldn't find a way to either use PAM or create yescrypt hashes with the provided tools, therefore I opted for a viable alternative, which may violate the "Arch approach" and in that case will require another round of brainstorming and exploring a way forward.

Regarding the issue #1111 at hand, it is correct that this PR does not solve the entire problem with plaintext passwords but only addresses the user account ones as also stated in the description.

Looking at why archinstall has plaintext passwords stored in a file in the first place the main use case seems to be that users can point the installer to the previously generated configuration files and start the installation without having to manually go through all steps again.

A potential option to workaround it completely could be:

  • Move non-sensitive config information (username, sudo etc.) into user_configuration.json
  • Don't write a user_credentials.json anymore
    • Manual input -> If a configuration file is provided as argument the user will be prompted to provide a user account password if username is in configuration file and disk encryption password if disk encryption was setup
    • Add argument options for user account passwords and disk encryption passwords (if specified it will not prompt for those manually)
    • Provide alternative hooks to fetch the passwords from a storage location?

@Torxed
Copy link
Member

Torxed commented Mar 24, 2025

I'm having a look! Overall this is a very welcome change.
Let me have a stab at the yescrypt thing.

@Torxed
Copy link
Member

Torxed commented Mar 24, 2025

So I'm not an expert in C, but perhaps this could be a starting point:

import ctypes
import ctypes.util

# Try to locate libxcrypt
lib_path = ctypes.util.find_library("xcrypt") or ctypes.util.find_library("crypt")
if not lib_path:
    raise RuntimeError("libxcrypt not found")

libxcrypt = ctypes.CDLL(lib_path)

libxcrypt.crypt.argtypes = [ctypes.c_char_p, ctypes.c_char_p]
libxcrypt.crypt.restype = ctypes.c_char_p

password = b"some password"
salt = b"$y$j9T$KbYPzGJXFPvMbHGhKSD8x/"

hashed = libxcrypt.crypt(password, salt)

Should generate something along the lines of:

b'$y$j9T$KbYPzGJXFPvMbHGhKSD8x/$Kip3ZdzbqIHxatfBqYQ1R5t1gHo2w1Z5U6pjNLI5Ed/'

Would this help? If so, perhaps something along the lines of:

import base64
import ctypes
import ctypes.util
import os

def gen_yescrypt_salt(cost=10):
	if not (1 <= cost <= 11):
		raise ValueError("Cost must be between 1 and 11.")

	param_string = f"j{cost}T"

	b64_salt = base64.b64encode(os.urandom(16)).decode("ascii")

	# libxcrypt expects a base64 salt without padding and trimmed to 22 chars
	b64_salt = b64_salt.replace("+", ".").replace("=", "")[:22]

	return f"$y${param_string}${b64_salt}"

def gen_yescrypt_hash(password, salt=None):
	if salt is None:
		salt = gen_yescrypt_salt()

	if type(salt) is str:
		salt = salt.encode('UTF-8')

	if type(password) is str:
		password = password.encode('UTF-8')

	lib_path = ctypes.util.find_library("xcrypt") or ctypes.util.find_library("crypt")
	libxcrypt = ctypes.CDLL(lib_path)
	libxcrypt.crypt.argtypes = [ctypes.c_char_p, ctypes.c_char_p]
	libxcrypt.crypt.restype = ctypes.c_char_p

	return libxcrypt.crypt(password, salt)

What I'm mostly unsure about, is how I can verify if the output from crypt() is actually honoring the parameters of the salt, so that it is actually a yescrypt hash.

@svartkanin
Copy link
Collaborator Author

That's some serious black magic :)
I can try to work with that!

Given that ur does tge problem of the cleartext disk encryption password remains. Any thoughts on the option I mentioned to pass passwords via the arguments?

@Torxed
Copy link
Member

Torxed commented Mar 24, 2025

That's some serious black magic :)
I can try to work with that!

It probably looks more scary than it is hehe.
Pretty standard "import C library", but since the library doesn't expose any Py* objects like standard Python C lib, we have to keep talking "C language" with it instead of import cryptx.so, hence the ctypes import (in case anyone is not familiar with this type of import magic).

We'll need to update the PKGBUILD to include one more dependency: libxcrypt.

By doing this approach, we'll import and utilize the same library shadow uses for yescrypt via the same dependency:

screenshot

Move non-sensitive config information (username, sudo etc.) into user_configuration.json

This is how it was back in the days, I don't mind reverting to it.

  • Don't write a user_credentials.json anymore
Personal Rant

I'm very torn here. Because one side of the aisle it makes sense to not write the credentials because it's easy to handle it incorrectly, leaving credentials vulnerable even if we use a hash algorithm.
On the other side, if handled correctly it's more or less perfectly safe to save it, and re-use it.

Ultimately I would hope for a passwordless approach using HSM's, but not everyone have access to such devices.

Long story short, perhaps asking for a password when loading a configuration is the most sane and secure approach we can support?

As long as we don't use --config and somehow the user gets confused about setting a password I'm fine with that approach, triggering your suggestion:

Manual input -> If a configuration file is provided as argument the user will be prompted to provide a user account password

@svartkanin
Copy link
Collaborator Author

Or we keep storing all in plaintext passwords and encrypt the credentials file with a master password that needs to be entered on save. And when the credentials file is passed we prompt for it?
That way only 1 password prompt may appear on launch and no need for a solution for the - disk encryption password

@Torxed
Copy link
Member

Torxed commented Mar 24, 2025

Or we keep storing all in plaintext passwords and encrypt the credentials file with a master password that needs to be entered on save. And when the credentials file is passed we prompt for it?

I like that approach! There's a overwhelmingly high chance that people will forget it ^^
We could also encrypt the file with a HSM, so there's multiple options here which is great.

The only drawback will be that we'll need to have to have additional handling of the different scenarios. But I like the idea of this :)

Basically a vault, and inside hashed credentials :)

@svartkanin
Copy link
Collaborator Author

svartkanin commented Mar 24, 2025

I like that approach! There's a overwhelmingly high chance that people will forget it ^^

Indeed, or they will potentially use the same password as they gave their user account ;)

The only drawback will be that we'll need to have to have additional handling of the different scenarios. But I like the idea of this :)

I think the friction won't be too much, we will need some form of prompt when a user selects to save the credentials file, and then some handling of decryption when the file gets passed as arg

Basically a vault, and inside hashed credentials :)

Will we still need to hash the credentials inside? We can try to hash the user account password with your approach above but the disk encryption one will be in clear text

I don't have access to the ISO atm but looking at https://gitlab.archlinux.org/archlinux/archiso/-/blob/4b709bcd5fc5e426ba75f72e1b80e59922d7c175/configs/releng/packages.x86_64 I can't see libxcrypt being explicitly included, @Torxed do you know if it's a transient dependency and available on the ISO? I suspect it must be as chpasswd hashes passwords...

@Torxed
Copy link
Member

Torxed commented Mar 25, 2025

Will we still need to hash the credentials inside? We can try to hash the user account password with your approach above but the disk encryption one will be in clear text

I think we should, in case we want to offer unencrypted JSON files for credentials in the future and while passing to the command-line (remember that the password if using --chpasswd --encrypted will be visible in cmd_history.txt, which is fine.. but having it hashed would make things better).

Also, the disk password is fine. But we should perhaps do (a bit of pseudo code):

with tempfile.TemporaryFile() as tmp_file:
    luks2(pwfile=tmp_file)

As it would lower the risk of us forgetting it somewhere.
And then in the encrypted creds.json we can just store it as a plaintext for now.

I can't see libxcrypt being explicitly included, @Torxed do you know if it's a transient dependency

It is, via: base <- shadow <- libxcrypt

But just to be sure, for instance in docker containers, we should add it as an explicit dependency. It also allows other packages to explicitly see that we depend on it so any updates to that library will tell them to notify us about any breaking changes :)

@svartkanin
Copy link
Collaborator Author

Okay sounds good, in that case I'll rework the PR to switch to the yescrypt hash instead of bcrypt. And in a future PR address the encryption of the file and potential arguments.

@svartkanin
Copy link
Collaborator Author

@Torxed Update the PR with the yescrypt hash generator

Comment on lines 162 to 193
@cached_property
def _libcrypt(self):
libc = ctypes.CDLL("libcrypt.so")

@classmethod
def _parse_backwards_compatible(cls, config_users: dict[str, dict[str, str]], sudo: bool) -> list['User']:
if len(config_users.keys()) > 0:
username = list(config_users.keys())[0]
password = config_users[username]['!password']
libc.crypt.argtypes = [ctypes.c_char_p, ctypes.c_char_p]
libc.crypt.restype = ctypes.c_char_p

libc.crypt_gensalt.argtypes = [ctypes.c_char_p, ctypes.c_int, ctypes.c_void_p, ctypes.c_size_t]
libc.crypt_gensalt.restype = ctypes.c_char_p

return libc

def _gen_yescrypt_salt(self) -> bytes:
salt_prefix = b'$y$'
rounds = 9
entropy = os.urandom(64)

salt = self._libcrypt.crypt_gensalt(salt_prefix, rounds, entropy, 64)
return salt

def _gen_yescrypt_hash(self, plaintext: str) -> str:
"""
Generation of the yescrypt hash was used from mkpasswd included
in the whois package which seems to be one of the few tools
that actually provides the functionality
https://github.com/rfc1036/whois/blob/next/mkpasswd.c
"""
enc_plaintext = plaintext.encode('UTF-8')
salt = self._gen_yescrypt_salt()

if password:
return [User(username, password, sudo)]
yescrypt_hash = self._libcrypt.crypt(enc_plaintext, salt)
return yescrypt_hash.decode('UTF-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have shown from documentation (#3276 (comment)), it is PAM that is responsible for encrypting passwords.

Here is the relevant code:
https://github.com/linux-pam/linux-pam/blob/3da5496be24031a5c806752d9f1892100346d8ff/modules/pam_unix/passverify.c#L494-L514

The rounds variable is handled here:
https://github.com/linux-pam/linux-pam/blob/3da5496be24031a5c806752d9f1892100346d8ff/modules/pam_unix/support.c#L189-L244

So for rounds the YESCRYPT_COST_FACTOR is read in from /etc/login.defs, set to a default of 5 if it is not set, and has enforcement of a sane value.

The code you wrote has rounds set to 9.

This is from just a quick glance over this. If you are going to emulate this, please put in the proper time to understand it so that the implementation is correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, I'll update the rounds accordingly

Copy link
Contributor

@codefiles codefiles Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also recheck the rest of this logic against the relevant PAM code. I was only using rounds to demonstrate that the code does not properly emulate the handling done by PAM, it may not be the only discrepancy. I did not check beyond that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, that's on me as well seeing as I laid the foundation for this code.

I believe that if we can set sane defaults that work with the login after an install, then that's good enough for a v1.0 of this.
As the overall benefit from using hashes outweigh 1:1 code logic with PAM or whomever, as long as it works. Separate PR's can be raised for such cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you got #3276 (comment) from an LLM chat bot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, why?

It was mostly copy paste from old garbage I had laying around from 5 years ago plus random googling on this specific hash function parameters.

As I've never worked with yescrypt that was new to me if that's what you're fishing for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it based on the appearance of the first example, the brevity of it and no functions. I took a look at the script you linked and admit I was wrong. Thanks for addressing it.

Copy link
Contributor

@codefiles codefiles Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving this chunk of code to its own module, something like libcrypt.py, and using this python source code as an example of how the code should be structured:
https://github.com/python/cpython/blob/bfc292abc10091a9298027596d107652ee4fe166/Lib/_pyrepl/_minimal_curses.py

Copy link
Contributor

@codefiles codefiles Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implementation of the above recommendation:

libcrypt.py

import ctypes


class error(Exception):
    pass


libcrypt = ctypes.CDLL('libcrypt.so')

libcrypt.crypt.argtypes = [ctypes.c_char_p, ctypes.c_char_p]
libcrypt.crypt.restype = ctypes.c_char_p

libcrypt.crypt_gensalt.argtypes = [ctypes.c_char_p, ctypes.c_ulong, ctypes.c_char_p, ctypes.c_int]
libcrypt.crypt_gensalt.restype = ctypes.c_char_p


def crypt(phrase: bytes, setting: bytes) -> bytes:
    result = libcrypt.crypt(phrase, setting)
    if result is None:
        raise error('crypt() returned NULL')
    return result


def crypt_gensalt(prefix: bytes, count: int) -> bytes:
    setting = libcrypt.crypt_gensalt(prefix, count, None, 0)
    if setting is None:
        raise error('crypt_gensalt() returned NULL')
    return setting

Simple use example:

import sys

import libcrypt


def main() -> int:
    if len(sys.argv) < 2:
        return 2

    password = sys.argv[1]
    prefix = b'$y$'
    rounds = 5

    try:
        setting = libcrypt.crypt_gensalt(prefix, rounds)
        hashed = libcrypt.crypt(password.encode(), setting)
    except libcrypt.error as err:
        print(err, file=sys.stderr)
        return 1

    print(hashed.decode())
    return 0


if __name__ == '__main__':
    sys.exit(main())

@codefiles
Copy link
Contributor

If a user interacting with archinstall through the TUI inputs a password for a root or user account, will the password be encrypted using this code and bypass the process in which PAM handles encryption of the password even if the user does not intend to save and reuse credentials?

@svartkanin
Copy link
Collaborator Author

If a user interacting with archinstall through the TUI inputs a password for a root or user account, will the password be encrypted using this code and bypass the process in which PAM handles encryption of the password even if the user does not intend to save and reuse credentials?

Correct, I opted for consistency to keep the hashing of passwords in a single place. I thought it might confuse users if they specify a password them PAM hashes and sets it in the shadow file and the config contains a different hash generated by archinstall.

@codefiles
Copy link
Contributor

The PKGBUILD will need libcrypt.so and libxcrypt as a dependencies. Both are in the PKGBUILD for PAM:
https://gitlab.archlinux.org/archlinux/packaging/packages/pam/-/blob/ace0ff539c0c8ff8e776757c5be273ffa376d865/PKGBUILD#L12-22

@svartkanin
Copy link
Collaborator Author

@codefiles Update the PR.
Regarding the linked salt creation PAM attempts to read /etc/login.defs as you pointed out https://github.com/linux-pam/linux-pam/blob/3da5496be24031a5c806752d9f1892100346d8ff/modules/pam_unix/support.c#L200 looking at the logic it'll consider YESCRYPT_COST_FACTOR if it is not commented out. Looking at the current ISO this value is set to

#YESCRYPT_COST_FACTOR=5

which means it'll be ignore and the default value of 5 applied. I've hardcoded it now to 5 instead of 9 but did not implement the actual reading of the value from login.defs (as that might change in the future). If you still consider that necessary I can add that functionality

@codefiles
Copy link
Contributor

codefiles commented Apr 7, 2025

The ISO just reflects how /etc/login.defs is provided by shadow. Yes I know, YESCRYPT_COST_FACTOR is commented out by default as provided by the shadow repository and Arch Linux does not change it. The point is that it is checked for by PAM, that means if it is ever changed, the hard coded 5 will not match what is in /etc/login.defs. A similar problem would occur if in the future Arch Linux decides on a different ENCRYPT_METHOD. As these are unlikely to change, I am good with the code as it is now.

I would have preferred you to keep crypt() and crypt_gensalt() as in my example because they were intended to be wrappers of those C functions. Good enough though.

@svartkanin
Copy link
Collaborator Author

Thanks for the feedback, I've included attempting to read the value from /etc/login.defs with the respective handling of values as in https://github.com/linux-pam/linux-pam/blob/3da5496be24031a5c806752d9f1892100346d8ff/modules/pam_unix/support.c#L225-L229

@Torxed Torxed merged commit 4f1d1b4 into archlinux:master Apr 8, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants